Generative adversarial network based image inpainting

Jingjing Gao

doi:10.54254/2755-2721/5/20230540

1. Introduction

Image Inpainting is a technology seeks to repair any damaged pixels in an incomplete image before reconstructing and producing a high-quality, in-depth semantic approximation of the original image. With the swift development of deep learning-related technologies and the application of artificial intelligence science, along with a significant rise in computer processing power, which has significantly advanced both science and technology as well as improved the quality of life for people. Many computer vision applications rely heavily on deep learning-based image inpainting techniques. It has grown to be a significant area of computer vision research.

The significance of studying image inpainting is not only to improve the current research method, but also to expand and improve its application in real life. The main applications are:

1) Object removal: removing unwanted objects from an image and repairing the occluded areas of the objects.

2) Image inpainting: repair pixel loss in images caused by improper processing, such as scratches.

3) Image retouching: Retouch photos of different people, such as getting rid of wrinkles, moles and other facial features.

4) Text removal: Remove unwanted text, watermark, etc. from images. Because of the wide application of image inpainting in real life, this field receives a lot of attention from researchers and has great development prospects.

Traditional image inpainting methods are unable to repair large broken areas or images with complex structures of blends and textures. Therefore, researchers have applied deep learning to computer vision tasks, and related image inpainting methods have emerged as a result.

Among the deep learning models with more prominent inpainting effects are the AutoEncoder (AE) [1], the U-Net [2], the Generative adversarial networks [3], transformer [4], and so on. They train deep models to obtain graphical image high-level semantic information, learning image structure texture information to repair large regionally broken images by training deep models to obtain high-level semantic These methods address the shortcomings of traditional image inpainting and achieve excellent inpainting results.

In this paper, a systematic and comprehensive study is conducted for GAN-based image inpainting methods. Various GAN-based image inpainting methods are analyzed and described, and common datasets and evaluation metrics are listed.

2. Related works

This section briefly introduces the basic principles of GAN and the mathematical representation of GAN.

2.1. Generative Adversarial Networks

GAN(Generative Adversarial Networks) is an unsupervised deep learning model for generating data by computer. The model generates excellent outputs by learning two modules in the framework: the Generative Model and the Discriminative Model. Generative adversarial networks are considered to be one of the most promising and active models currently used in the direction of sample data generation, image generation, image inpainting, image conversion, text generation, etc.

Generator: Generates data by machine, with the aim of "fooling" the discriminator as much as possible, and the generated data is recorded as \( G(z) \) .

The discriminator: determines whether the data is fake or not, with the aim of finding "fake data" created by the "generator" as much as possible.

Thus, G and D form a gaming process, and as training (adversarial) proceeds, G generates data closer and closer to the real data, and D discriminates the data at a higher and higher level.

2.2. Training Process of GAN

Stage 1: fixing "discriminator D" and training "generator G". Using a discriminator with good performance, G keeps generating "fake data" and then gives it to this D to judge. At the beginning, G is very weak, so it is easy to be discriminated. At this time, D is basically a "blind guess" and has a 1/2 probability of determining right or not.

Stage 2: Fix the "generator G" and train the "discriminator D" (After we will use A to refer to discriminator D and S to refer to generator G). It is meaningless to continue training B after passing the first stage. At this time, we fix B and start training A. At this time, we fix S and start training A. Through continuous training, A improves his discriminative ability, and eventually it can almostly determine the fake data.

Repeat the first stage and second stage. Through the continuous cycle, both S and A become stronger and stronger. Eventually we get a S that works very well, and we can use it to generate data.

2.3. Mathematical Expression of GAN

2.3.1. Mathematical expression of GAN. The generative model maps the data into the generative space from an input space (i.e., through the input data, the output data is generated under the action of a function). In order to make the generated data distribution closer to the real one, the generating function G typically models a variety of completely different distribution types in the form of a neural network.

Here we use A to refer to discriminator D and S to refer to generator G.

The following is the cost function in the generative adversarial network, taking the discriminator D as an example, the cost function is written \( J(A)J\text{^}\lbrace (A)\rbrace J(A) \) , and the form is shown below:

\( {J^{(A)}}({θ^{(A)}},{θ^{(S)}})=-\frac{1}{2}{E_{x~{P_{data}}}}logA(x)-\frac{1}{2}{E_{x~{P_{z}}}}log{(1-A(S（z）))}\ \ \ (1) \)

The generators and discriminators are closely related and they both can be considered as a zero-sum game, so their combined cost should be zero.

3. Deep learning based image inpainting methods

There are many inpainting methods based on deep learning, here we mainly summarize and list the GAN-based image inpainting methods and summarize the advantages and disadvantages of each method.

3.1. GAN class image inpainting methods

The GAN structure-based image inpainting method is to generate the image to be restored directly by a generator, and the input can be random noise. Compared with patch method and Encoder-Decoder method, GAN has more advantages in face inpainting, so it is more promising to be used in human daily life.

4. Evaluation index and datasets

4.1. Evaluation index

So as to evaluate the performance of image inpainting methods, researchers have developed different evaluation metrics to evaluate the restored images they generate. Objective evaluation metrics and subjective evaluation metrics constitute the evaluation metrics. Subjective evaluation metrics mainly rely on human subjective judgment ability, so in most cases, we use objective evaluation metrics for quantitative evaluation. Table 2 lists some major complete reference of image evaluation metrics and their characteristics.

4.2. Datasets

Deep learning-based image inpainting methods need experiments on a large amount of images to evaluate the effect of the method and to learn image feature information by training a large amount of images. However, it is very difficult to collect images and corresponding broken images by oneself. Therefore, researchers usually use public image datasets for training and testing. Some of the datasets frequently used by researchers are given in Table 3.

5. Conclusion

Image inpainting is an irreplaceable part in the field of computer vision. With the lightning speed development of computers and frequent use of digital tools in recent years, the field of image inpainting has also received more attention from researchers. The research on image inpainting based on deep learning is relatively short, but the progress flies by optimizing in terms of model structure, loss function, and prior information to obtain better inpainting results. There we mainly summarize Gan-based image inpainting, and briefly summarize the common datasets and evaluation metrics for image inpainting. The following descriptions are given for the shortcomings of existing image inpainting methods to advance future research work.

5.1. High-resolution image inpainting

Studying low-computational cost high-resolution image inpainting models is one of the most urgent tasks today. More current image inpainting methods still focus on the study of low-resolution image inpainting. However, with the development of the data era, it is obvious that low-resolution images can no longer meet the demand for commercial use.

5.2. Create a dataset based on Asian face images

How to create a dataset based on Asian face images is one of the key directions for future research. Currently, deep learning-based inpainting methods have achieved better inpainting results on face datasets. However, the face datasets that are heavily used at present are all datasets of European and American face images. The models trained with these datasets will show inaccurate and even wrong inpainting results when restoring Asian faces. Therefore, collecting face datasets with Asian facial features is one of the priorities of the current research.

5.3. Enables face image inpainting in different tasks and scenarios

How to implement face image inpainting in different tasks and scenarios is a challenge that needs to be solved in the future. In daily life, face image inpainting is used in many applications, such as in public safety and face recognition. However, face image inpainting in different tasks and scenarios will have many challenges that cannot be predetermined in advance, such as face image inpainting when people wear masks under epidemic normalization. All these problems will enhance the difficulty of face image inpainting, so collecting and organizing broken face image datasets for different tasks and scenarios is one of the research hotspots in the future perhaps.

References

[1]. RUMELHART D E, HINTON G E, WILLIAMS R J. (1985). Learning internal representations by error propagation[R]. California Univ San Diego La Jolla Inst for Cognitive Science.

[2]. RONNEBERGER O, FISCHER P, BROX T. (2015). U-net: Convolutional networks for biomedical image segmentation[C]//International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, Munich, Germany, 234-241.

[3]. GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. (2014). Generative adversarial nets[J]. Advances in neural information processing systems, 27.

[4]. VASWANI A, SHAZEER N, PARMAR N, et al. (2017). Attention is all you need[J]. Advances in neural information processing systems, 30.

[5]. YEH R A, CHEN C, YIAN LIM T, et al. (2017). Semantic image inpainting with deep generative models[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Honolulu, HI, USA: 5485-5493.

[6]. ZHANG H, HU Z, LUO C, et al. (2018). Semantic image inpainting with progressive generative networks[C]//Proceedings of the 26th ACM international conference on Multimedia. Seoul, South Korea, 1939-1947.

[7]. ZHANG X, WANG X, SHI C, et al. (2022). De-gan: Domain embedded gan for high quality face image inpainting[J]. Pattern Recognition, 124: 108415.

[8]. ZENG Y, LIN Z, LU H, et al. (2021). Cr-fill: Generative image inpainting with auxiliary contextual reconstruction[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Canada: 14164- 14173.

[9]. YANG C, LU X, LIN Z, et al. (2017). High-resolution image inpainting using multi-scale neural patch synthesis[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Honolulu, HI, USA: 6721-6729.

[10]. ZENG Y, FU J, CHAO H, et al. (2022). Aggregated contextual transformations for high-resolution image inpainting[J]. IEEE Transactions on Visualization and Computer Graphics.

Cite this article

Gao,J. (2023). Generative adversarial network based image inpainting. Applied and Computational Engineering,5,93-98.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 3rd International Conference on Signal Processing and Machine Learning

ISBN：978-1-915371-57-7(Print) / 978-1-915371-58-4(Online)

Editor：Omer Burak Istanbullu

Conference website: http://www.confspml.org

Conference date: 25 February 2023

Series: Applied and Computational Engineering

Volume number: Vol.5

ISSN：2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).

References

[1]. RUMELHART D E, HINTON G E, WILLIAMS R J. (1985). Learning internal representations by error propagation[R]. California Univ San Diego La Jolla Inst for Cognitive Science.

[3]. GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. (2014). Generative adversarial nets[J]. Advances in neural information processing systems, 27.

[4]. VASWANI A, SHAZEER N, PARMAR N, et al. (2017). Attention is all you need[J]. Advances in neural information processing systems, 30.

[7]. ZHANG X, WANG X, SHI C, et al. (2022). De-gan: Domain embedded gan for high quality face image inpainting[J]. Pattern Recognition, 124: 108415.

[10]. ZENG Y, FU J, CHAO H, et al. (2022). Aggregated contextual transformations for high-resolution image inpainting[J]. IEEE Transactions on Visualization and Computer Graphics.