Samsung's New Predictable Sparse Attention Technique

Language dominates and shapes our lives in its written and spoken forms. Computational linguistics is the scientific study of language from a computational perspective. The Annual Meeting of the Association for Computational Linguistics (ACL) is organized by the Association of Computational Linguistics, which has strict employment standards. A paper published by the multimodal MRC division of Samsung R&D Institute China–Beijing (SRC-B) has been recently accepted by ACL 2022.

The Paper’s Key Points
1. The paper proposes the Fourier Sparse Attention for Transformer (FSAT), which will extend the transformer for long
sequences. The overall complexity of the sequence length is reduced from O(L2) to O(Llog L).
2. It introduces the pooled hidden state cross to implement FSAT.
3. Empirically, extensive experiments (natural language, vision, and math) demonstrate the advantages of the paper’s
proposed methods, and new state-of-the-art results are achieved in the Long Range Arena (LRA) benchmark.

SRC-B’s Multimodal MRC Component
We are focusing on the advanced technologies in the natural language processing (NLP) field, including pre-trained language understanding, machine reading comprehension (MRC), dialogue system, etc. We are also interested in multimodal-related topics and high-efficiency learning architecture. Fortunately, we have already published more than 10 papers for the top conferences and won 1st place in some prestigious international competitions, such as the SemEval (International Workshop on Semantic Evaluation) Challenge. We will keep going and try to make more contributions to Samsung.

SRC-B’s team members

About ACL 2022
The Association for Computational Linguistics is the premier international scientific and professional society for people working on computational problems involving human language, a field often referred to as either computational linguistics or natural language processing (NLP). ACL was founded in 1962 and was originally named the Association for Machine Translation and Computational Linguistics (AMTCL). It became known as ACL in 1968. ACL 2022 is the association’s 60th meeting.

“Long-Range Sequence Modeling with Predictable Sparse Attention”
This study proposed an improved efficient transformer for long-range sequence modeling, displaying state-of-the-art performance on public LRA data sets. The results have proven to be practically and theoretically significant.

Proposed hidden state crossed module

Self-attention mechanism has been seen to be an effective approach for capturing global context dependencies in sequence modeling, but it suffers from quadratic complexity in time and memory usage. Due to the sparsity of the attention matrix, much computation is redundant. Therefore, in this paper, we design an efficient Transformer architecture named “Fourier Sparse Attention for Transformer” for fast, long-range sequence modeling. We provide a brand-new perspective for constructing a sparse attention matrix, i.e., making the sparse attention matrix predictable.

The two core sub-modules are:
1. A fast Fourier transform based hidden state crossed module, which captures and pools L^2 semantic combinations in
O(L log L) time complexity.
2. A sparse attention matrix estimation module, which predicts dominant elements of an attention matrix based on the
previous hidden state cross module’s output.

Through reparametrization and gradient truncation, FSAT successfully obtained the index of dominant elements. The sequence length’s overall complexity is reduced from O(L^2) to O(L log L). Extensive experiments (natural language, vision, and math) show that FSAT remarkably outperforms the standard multi-head attention and its variants in various long-sequence tasks with low computational costs. FSAT also achieves new state-of-the-art results in the Long Range Arena benchmark.