Reviving Iterative Training with Mask Guidance for Interactive Segmentation


IEEE International Conference on Image Processing (ICIP)




Recent works on click-based interactive segmentation havedemonstrated state-of-the-art results by using various inferencetimeoptimization schemes. These methods are significantly more computationally expensive than feedforward approaches, as they run backward gradient passes during inference.Moreover, backward passes are not supported in popularmobile frameworks, which complicates the deploymentof such methods on embedded devices. In this paper, we studydesign choices for interactive segmentation and discover thatstate-of-the-art results can be obtained without any additional optimization schemes. We propose a simple feedforwardmodel for click-based interactive segmentation that employsthe segmentation masks from previous steps. It allows not only segmenting an entirely new object but also correcting anexisting mask. We analyze the performance of models trainedon different datasets and observe that the choice of a trainingdataset has a large impact on the quality of interactive segmentation.We find that the models trained on a combinationof COCO and LVIS with diverse and high-quality annotations outperform all existing models. The code and trained modelsare available at