Free-Form Image Inpainting with Gated Convolution


  • 1 University of Illinois at Urbana–Champaign
  • 2 Adobe Research

Note: the first version DeepFillv1 can be found here.


YouTube Video Demo

Best viewed with highest resolution 1080p.


Free-form image inpainting results by our system built on gated convolution. It can take free-form masks and inputs like sketch from users. Our system helps users quickly remove distracting objects, modify image layouts, edit faces and interactively create novel objects in images.


Abstract

We present a novel deep learning based image inpainting system to complete images with free-form masks and inputs. The system is based on gated convolutions learned from millions of images without additional labelling efforts. The proposed gated convolution solves the issue of vanilla convolution that treats all input pixels as valid ones, generalizes partial convolution by providing a learnable dynamic feature selection mechanism for each channel at each spatial location across all layers. Moreover, as free-form masks may appear anywhere in images with any shapes, global and local GANs designed for a single rectangular mask are not suitable. To this end, we also present a novel GAN loss, named SN-PatchGAN, by applying spectral-normalized discriminators on dense image patches. It is simple in formulation, fast and stable in training. Results on automatic image inpainting and user-guided extension demonstrate that our system generates higher-quality and more flexible results than previous methods. We show that our system helps users quickly remove distracting objects, modify image layouts, clear watermarks, edit faces and interactively create novel objects in images. Furthermore, visualization of learned feature representations reveals the effectiveness of gated convolution and provides an interpretation of how the proposed neural network fills in missing regions.


Gated Convolution

Illustration of partial convolution (left) and gated convolution (right).


Visualization of Gated Convolution

Comparisons of gated convolution to partial convolution with visualization and interpretation of learned gating values. We first show our inpainting network architecture based on DeepFillv1 by replacing all convolutions with gated convolutions in the 1st row. Note that for simplicity, the following refinement network in DeepFillv1 is ignored in figure. With same settings, we train two models based on gated convolution and partial convolution separately. We then directly visualize intermediate un-normalized values of gating output in the 2nd row. The values differ mainly based on three parts: background, mask and sketch. In the 3rd row, we provide an interpretation based on which part(s) have higher gating values. Interestingly we also find that for some channels (e.g. channel-31 of the layer after dilated convolution), the learned gating values are based on foreground/background semantic segmentation. For comparison, we also visualize the un-learnable fixed binary mask M of partial convolution in the 4th row. The inpainting results of gated convolution and partial convolution can be found in Section 4 in paper.


Comparison Results


More Results



Citation

@article{yu2018free,
  title={Free-Form Image Inpainting with Gated Convolution},
  author={Yu, Jiahui and Lin, Zhe and Yang, Jimei and Shen, Xiaohui and Lu, Xin and Huang, Thomas S},
  journal={arXiv preprint arXiv:1806.03589},
  year={2018}
}

@article{yu2018generative,
  title={Generative Image Inpainting with Contextual Attention},
  author={Yu, Jiahui and Lin, Zhe and Yang, Jimei and Shen, Xiaohui and Lu, Xin and Huang, Thomas S},
  journal={arXiv preprint arXiv:1801.07892},
  year={2018}
}