Publications

Internal Pattern Matching Queries in a Text and Applications

Published

SIAM Journal on Computing

Date

2023.05.02

Research Areas

Abstract

We consider several types of internal queries, that is, questions about fragments of a given text T specified in constant space by their locations in T. Our main result is an optimal data structure for Internal Pattern Matching (IPM) queries which, given two fragments x and y, ask for a representation of all fragments contained in y and matching x exactly; this problem can be viewed as an internal version of the Exact Pattern Matching problem. Our data structure answers IPM queries in time proportional to the quotient |y|/|x| of fragments' lengths, which is required due to the information content of the output. If T is a text of length n over an integer alphabet of size σ, then our data structure occupies O(n/logσn) machine words (that is, O(nlogσ) bits) and admits an O(n/logσn)-time construction algorithm. We show the applicability of IPM queries for answering internal queries corresponding to other classic string processing problems. Among others, we derive optimal data structures reporting the periods of a fragment and testing the cyclic equivalence of two fragments. IPM queries have already found numerous further applications, following the path paved by the classic Longest Common Extension (LCE) queries of Landau and Vishkin (JCSS, 1988). In particular, IPM queries have been implemented in grammar-compressed and dynamic settings and, along with LCE queries, constitute elementary operations of the PILLAR model, developed by Charalampopoulos, Kociumaka, and Wellnitz (FOCS 2020). On the way to our main result, we provide a novel construction of string synchronizing sets of Kempa and Kociumaka (STOC 2019). Our method, based on a new restricted version of the recompression technique of Jeż (J. ACM, 2016), yields a hierarchy of O(logn) string synchronizing sets covering the whole spectrum of fragments' lengths.