Telecommunications refers to the transmission of information remotely, and help connect humans over a distance greater than that feasible with the human voice or vision. Telecommunications can be implemented through various types of technologies. Depending on the mode of how the signals are transmitted, it can be broadly divided into two categories: wired communication and wireless communication. During the past several decades, revolutions in wireless communications evolved into generations of mobile communication standards, such as 1G, 2G, 3G, and 4G standards. They define the blueprints for the radio system and networks operations that enable mobile communication and global device connectivity. Now, the 5th generation technology standard (i.e., 5G) comes into our life. With its multi-Gbps data speed, ultra-low latency, strong reliability, and massive capacity, 5G aims to significantly improves the user experience.
With the rollout of 5G, the amount of mobile data is increasing even more quickly than that in the past. This ongoing increase is due to the increase in the number of connected devices and the expansion of new applications/services, such as high-resolution video, augmented reality, and virtual reality. It is reported that the amount of video-related traffics and social networking data are expected to increase by more than 24% annually over the next five years, and the monthly consumed wireless data will exceed 77 Exabytes in 2022. However, the communication traffic loads between different 5G cells are not evenly distributed. This undermines their ability to handle the users' huge demands. For example, some studies show that almost 50% of network traffic is supported by only 15% of the cells. Existing systems use some pre-defined rules, referred to as rule-based methods, to balance the traffic loads and are heavily dependent on empirical knowledge. They may not operate effectively to control the dynamic and complex 5G systems.
To tackle the limitations of the rule-based methods, at Samsung, we are investigating state-of-the-art Reinforcement Learning (RL) to design AI-driven adaptive control policies for balancing loads in telecommunication systems . These intelligent systems can improve the performance of telecommunication networks. RL is a very hot topic in the AI world, but many of the successes are on artificial benchmarks or scenarios. Our work not only applies RL to a problem of huge practical value (i.e. load balancing for 5G telecommunication networks), but also represents an exciting example of using RL in the real world.
RL is a machine learning paradigm that aims to learn an optimal control policy via interacting with the system. Figure 1 shows the concept of the RL paradigm. In RL, an agent will take control actions based on the current observations of environment states and receive the corresponding rewards from the environment. Powered by deep learning, deep RL has achieved impressive success on Go games, board games, and many other real-world applications, e.g., data center cooling and recommender systems. Due to the recent tremendous success, RL has been gaining substantial attention. It has been shown that RL is of significant potential to solve real-world sequential decision-making problems.
Figure 1. Reinforcement Learning (RL) paradigm.
Researchers in SAIC-Montreal are investigating advanced RL algorithms and methodologies to improve telecommunication system performance. Specifically, we first start with the Load Balancing (LB) problem. The LB is treated as a sequential decision-making problem. We formulate the LB problem as a Markov Decision Process (MDP) and use RL to tackle it. Based on the current observations of the telecommunication system states, the RL agent will output a set of control actions corresponding to a group of threshold values for LB features (Samsung Network Business Unit designs these features). Extensive experiments have been carried out to evaluate the effectiveness of the proposed RL-based solution. As shown in Figure 2, the proposed approach can significantly outperform the baseline (rule-based) method over all the critical system KPIs (key performance index). Specifically, the RL-based solution can achieve better network performance on metrics including those represent the minimum IP throughput, and various statistic measures of how consistent and fair the metrics among the end users are. These network measurements correspond to the ability of the network to satisfy the needs to the users, thus providing a better experience.
Figure 2. Improvement of RL-based LB method over the traditional rule-based method.
When bringing RL to the real world, we will encounter several challenges. One major challenge comes from the various kinds of delays, including but not limited to sensing delays, data aggregation delays, computing delays, and actuation delays. These delays may easily make a control decision obsolete when deployed in the real world. To overcome these delays, predicting the future system status and KPIs becomes a key component of intelligent 5G networks. With the future in our hands, we can safely plan out the network operations to achieve enhanced and robust system performance.
Specifically, in SAIC-Montreal, researchers have developed advanced AI models to predict the future sequences of system status and KPIs. Unlike predicting just one point in the future, forecasting the sequence allows us to make relatively long-term (e.g., daily or weekly) operation plans with hourly details. Furthermore, our models provide prediction confidence intervals, which could help operators to apply conservative or aggressive operation policies depending on their needs. Figure 3 shows cell Physical Resource Block (PRB) usage sequence prediction with two different confidence intervals. In an LTE (Long Term Evolution) or 5G network, a PRB is the smallest element of resource allocation assigned by the eNB/gNB scheduler. The most recent work on traffic prediction  of SAIC-Montreal was published in IEEE GLOBECOM 2021 and won the Best Paper Award with the title of “Traffic Prediction at Heterogeneous 5G Edge with Data-Efficient Transfer Learning”. This method allows the operators to save data usage by more than 20% when aggregating knowledge of other base stations to a data-limited base station but still can do an accurate prediction.
Figure 3. Predicting the future sequence of cell PRB usage.
Besides the challenges of delays, there are two other major concerns on bringing the RL-based solution to the real world. One is the potential distribution gap between the simulator and the real world. The other concern is system safety, i.e., whether the RL agent will output some dangerous actions, and/or drive the system into some dangerous states. Researchers in SAIC-Montreal have also designed mechanisms to deal with these two concerns. Specifically, we have developed a transfer learning-based mechanism  that will choose a suitable pre-learned RL-based control policy based on measuring the similarity of the system dynamics of the real-world environment and system dynamics of the simulator we used to learn the control policy. As for the safety concern, we design to train the RL agent to act in observance of the system states and add a safety layer to filter out potentially dangerous actions. Thus, with these additional mechanisms, we can better exploit the benefit of RL for real-world telecommunication systems.
 Di Wu et al., Load Balancing for 5G Communication Networks via Data-Efficient Deep Reinforcement Learning, Globecom 2021- IEEE Global Communications Conference, IEEE, 2021.
 Jikun Kang, et al., Hierarchical Policy Learning for Hybrid Communication Load Balancing. ICC 2021-IEEE International Conference on Communications, IEEE, 2021.
 Xi Chen, et al., One for All: Prediction of Communication Traffic via Data-Efficient Explainable Transfer Learning, Globecom 2021- IEEE Global Communications Conference, IEEE, 2021.