Multi-Modal Beam Selection: A Transfer Methodology for Multi-Frequency

By Dong Wang Samsung Research China - Beijing
By Huiyang Wang Samsung Research China - Beijing


The increasing application of millimeter wave and terahertz techniques have alleviated the constraints on spectrum resources in wireless communication networks. Nevertheless, high-frequency electromagnetic waves experience a severe path loss that results in cellular coverage reduction in outdoor settings [1]. Multiple-antenna and beamforming techniques play crucial roles to enhance the throughput. In analog beamforming, existing methods such as brute-force search and hierarchical search are widely employed in the 3rd Generation Partnership Project and commercial products. Although these methods guarantee beam selection accuracy, their implementation could incur considerable time delay and transmission cost, particularly in massive multiple-input and multiple-output (MIMO) scenarios.

In this paper, we present a multi-modal beam selection method and provide a frequency-transfer framework for the training model. The highlight of our work is that the proposed framework can be generalized to enormous carrier frequencies and complex transmission environments including line of sight (LoS) and non-line of sight (NLoS) environments via limited samples. Specifically, we design a multi-modal based network that embeds the relative locations of BS and UE, as well as the reflection features of the environment, to assist in the best beam selection. To overcome the challenge of small dataset learning, we develop a frequency transfer framework that transfers both the knowledge of network parameters and data information. Unlike existing methods that only transfer neural network parameters, our approach carefully extracts environment information and path loss information from horizontal and vertical beams using a data transfer method. By leveraging the frequency transfer, our proposed multi-modal network significantly reduces the performance gap between large and small datasets and outperforms existing convolutional neural network (CNN), recurrent neural network (RNN), and multilayer perceptron (MLP) networks.

System Setup & Problem Formulation

We consider a downlink tri-sector cellular system comprising a base station and K user equipments (UEs). Each sector is angle of 120 degrees. The BS is equipped with Nt = NhNv rectangular uniform planar array (UPA) antennas, where Nh and Nv denote the number of antennas in the horizontal and vertical directions, respectively. Each UE has Nr uniform linear array antennas.Due to the attenuation of the signal varying in different carrier frequencies, the DL-based approaches are usually frequency-specific and data-hungry for the model training, which requires huge samples of each carrier frequency. The goal of our work is to design a frequency transfer approach that utilizes the available dataset to new carrier frequency.

Data Augmentation

Two data augmentation methods are introduced in this subsection, which is critical for data preprocessing.

1) Cell Rotation: Motivated by image rotation in the image processing, we rotate the UE around the BS as the center. In order to maintain the structure of the cell, the angle of the rotation is the coverage angle of each cell, i.e., 120 degrees and 240 degrees.

2) Data Transfer: The available dataset S1 is employed to generate the samples in carrier frequency f2. From Fig. 1, we can see that the horizontal RSRP vectors w11, w12, w13 and w14 share the same trend in variations. Moreover, the blockage of the transmission degrades the signal power of the corresponding horizontal beam, which caused the difference of the RSRP curve between LoS and NLoS scenarios. Moreover, the value of vertical RSRP is inversely the distance between the BS and the UE. The path loss features are implicitly contained within the variation of vertical RSRP vectors. Therefore, we extract the environment features and the path loss features to generate the new samples in carrier frequency f2 by combining the horizontal RSRP vectors in S1 and the vertical RSRP vectors in S2.

Figure 1. The RSRP matrix of the blocked and unblocked UEs

Proposed FtransNET

The transformer is adopted as the backbone of our proposed network. The proposed FtransNet consists of four components, including data augmentation, preprocessing, embedding layer, and transformer, which is shown in Fig. 2. The details of each component are introduced as follows: 1) Data Augmentation: Given the samples in S1 and S2, cell rotation and data transfer are adopted to augment the samples in carrier frequency f2. 2) Preprocessing: The raw input of the assisted information including locations and transmission images are first converted into the relation coordinate ∆p, relative distance d, angle θ, and label of horizontal beam index of the image. In addition, the relation coordinates ∆p, relative distance d and RSRP are normalized in the range [0, 1] for the robustness of model training. 3) Embedding: The preprocessed location information and RSRP are concatenated and projected into the embedding layer. We also look up the reflecting embedding table to turn the reflection index into embedding based on the location of the UE. Then all embeddings are concatenated in the embedding layer. 4) Transformer: The transformer encoder block is adopted and complies with sequential modules of layer normalization, multi-head self-attention, resNet layer, and feed-forward layer. With stacking of 4 blocks, the output features are projected into RSRP values ˆW through an MLP layer, where the best beam is selected by calculating the maximum RSRP.

Figure 2. The proposed FtransNET for beam selection.


We adopt the beam selection dataset consisting of 104 samples of carrier frequency f1 and 103 samples for carrier frequency f2, which is published at For a fair comparison, baseline CNN, RNN, and MLP apply identical data augmentation and preprocessing before the model training. The default number of sampled beam pairs in an RSRP matrix is Ns = 8. The beam selection accuracy of the proposed FtransNet and the benchmarks are evaluated in Fig. 3. The proposed FtranNet outperforms the CNN, RNN, and MLP in top-1, top-3, and top-5 accuracy. In addition, the proposed data augmentation scheme improves 5% beam selection accuracy, which shows the advantage of our proposed frequency transfer method and significantly reduces the performance gap between the carrier frequency f1 and f2 due to the enlarged environment features and location features. By employing the reflection feature, the proposed FtransNet can achieve 96% beam selection accuracy with 6% overhead of the brute-force search.

Figure 3. The proposed FtransNET for beam selection.

Figure 3. The top-1, top-3, top-5, and averaged beam selection accuracy are evaluated for the proposed scheme and the CNN, RNN, and MLP networks


In this paper, we developed a frequency transfer method and proposed a multi-modal beam selection scheme with limited samples. By applying data augmentation and preprocessing, the environment and location features are significantly enlarged, which improves the beam selection performance with low overhead. The ramification of this paper is that it provides the data transfer mechanism of the different carrier frequencies which has good generalization ability to complex environments and new carrier frequencies. In particular, this approach presents the great potential of how to overcome the data shortage of DL-based method training in future commercial radio access networks.


[1] W. Roh, J.-Y. Seol, J. Park, B. Lee, J. Lee, Y. Kim, J. Cho, K. Cheun, and F. Aryanfar, “Millimeter-wave beamforming as an enabling technology for 5G cellular communications: theoretical feasibility and prototype results,” IEEE Commun. Mag., vol. 52, no. 2, pp. 106–113, Feb. 2014.