The goal of this project is to try to implement Ghost but with other face detection and recognition models than InsightFace RetinaFace and ArcFace to allow a more permissive licence than the InsightFace ones. It includes a full rewrite of the original Ghost repository code, integrating Pytorch Lightning to boost training and using other datasets than VGGFace2.
Here you can find the ethic chart written by the original authors of Ghost which still holds today:
"Deepfake stands for a face swapping algorithm where the source and target can be an image or a video. Researchers have investigated sophisticated generative adversarial networks (GAN), autoencoders, and other approaches to establish precise and robust algorithms for face swapping. However, the achieved results are far from perfect in terms of human and visual evaluation. In this study, we propose a new one-shot pipeline for image-to-image and image-to-video face swap solutions - GHOST (Generative High-fidelity One Shot Transfer).
Deep fake synthesis methods have been improved a lot in quality in recent years. The research solutions were wrapped in easy-to-use API, software and different plugins for people with a little technical knowledge. As a result, almost anyone is able to make a deepfake image or video by just doing a short list of simple operations. At the same time, a lot of people with malicious intent are able to use this technology in order to produce harmful content. High distribution of such a content over the web leads to caution, disfavor and other negative feedback to deepfake synthesis or face swap research.
As a group of researchers, we are not trying to denigrate celebrities and statesmen or to demean anyone. We are computer vision researchers, we are engineers, we are activists, we are hobbyists, we are human beings. To this end, we feel that it's time to come out with a standard statement of what this technology is and isn't as far as us researchers are concerned.
- GHOST is not for creating inappropriate content.
- GHOST is not for changing faces without consent or with the intent of hiding its use.
- GHOST is not for any illicit, unethical, or questionable purposes.
- GHOST exists to experiment and discover AI techniques, for social or political commentary, for movies, and for any number of ethical and reasonable uses.
We are very troubled by the fact that GHOST can be used for unethical and disreputable things. However, we support the development of tools and techniques that can be used ethically as well as provide education and experience in AI for anyone who wants to learn it hands-on. Now and further, we take a zero-tolerance approach and total disregard to anyone using this software for any unethical purposes and will actively discourage any such uses."
We understand the unethical potential of GhostV2 and are committed to protecting against such behavior. The repository has been modified to prevent the processing of inappropriate content, including nudity, graphic content, and sensitive content. Collaboration with websites that promote the use of unauthorized software is strictly prohibited. Those who intend to engage in such activities will be subject to repercussions, such as being reported to authorities for violating the law.
- Clone this repository
git clone https://github.com/dimitribarbot/ghostv2.git
cd ghostv2- Install dependent packages
pip install -r requirements.txt- Download weights
To only download the needed models for inference run this script from the root folder of the repository:
sh download_inference_models.shTo download all the needed models for inference, dataset preprocessing and training, run this script from the root folder of the repository:
sh download_all_models.shFor the moment, face swap only works for single images containing a single face (in case of multiple faces, the first face will be used, sorted by left eye and then right eye coordinates).
Run inference using our GhostV2 pretrained model by specifying the path to a source file containing the face to be swapped into the target at target file path. The output image will be created at output file path:
python inference.py --source_file_path={PATH_TO_IMAGE} --target_file_path={PATH_TO_IMAGE} --output_file_path={PATH_TO_IMAGE}Note that an NSFW filter has been added to prevent the creation of malicious content.
By default, after main model inference, an enhancing step using GFPGAN v1.4 model will be performed, followed by a face paste back step. For this last step, we provide multiple options:
ghost: adapted from the GhostV1, this version uses FaceAlignment to get facial landmarks in the source and target images in order to paste the output face into the target image. This option is the default.facexlib_with_parser: the code was largely inspired by facexlib. This version uses face-parsing.PyTorch internally to parse the output face and paste it into the target image.facexlib_without_parser: the code was largely inspired by facexlib. This version only uses code to paste the output face into the target image.insightface: the code was largely inspired by insightface. This version only uses code to paste the output face into the target image.basic: this option directly uses the output of the main model inference to paste the output face into the target image.none: no paste back will be done, the returned image will be the swapped face only (256x256), not the face swapped in the target image.
Eventually, after paste back, an extra step may be done when choosing either ghost or facexlib_with_parser paste back option. We propose to inpaint face edges using the SDXL inpainting model to improve the output results.
All command line optional parameters can be found in this argument file.
It is still possible to run inference with the original version of Ghost for comparison. To do that, first run the download_all_models.sh script and then run the inference script with the following parameters:
--G_path=./weights/GhostV1/G_unet_2blocks.safetensors--face_embeddings=arcface--align_mode=insightface_v1
Note however that the ArcFace model used internally follows the InsightFace licence.
It is possible to replicate the source/target matrix of the Image Swap Results section by running the following script:
python demo.pyInternally, it uses the same command line parameters as for inference. Options can be found in this argument file.
We provide scripts to prepare the datasets used for training. We mainly use two datasets for our training stage:
- Laion Face dataset: this dataset contains 50 million images with faces. For our pretrained model, we only downloaded the first part out of the 32 parts it contains.
- Lagenda dataset: originally used for age and gender recognition tasks, this dataset is well suited for our face swap task. It can be used to train a model faster than with the Laion-Face dataset.
We experimented a lot with the dataset preprocessing and we come up with the following proposed solution:
- We exclude images that are too small and contain faces that are too small,
- We use FaceAlignment and Live Portrait landmark models and code to exclude faces which are not fully visible,
- We use Live Portrait to generate various versions of the same face with random facial expressions,
- We optionnaly use GFPGAN v1.2 to enhance face quality.
Specific arguments for the Laion-Face dataset preprocessing, such as the dataset location, can be found in this argument file. Specific arguments for the Lagenda dataset preprocessing, such as the dataset location, can be found in this argument file.
N.B.: For the Laion Face dataset, you may want to download it using the explanations given here.
We tried several alignment techniques while preprocessing the datasets, and we found that the latest version of the InsightFace alignment code gives the best results. The list of the distinct alignment techniques and other preprocessing parameters can be found in this argument file.
It is also possible to compare the various alignment modes by running the following command:
For a single image:
python align.py --source_image={PATH_TO_IMAGE} --aligned_folder={OUTPUT_PATH} --align_mode={ALIGN_OPTION}Or for an entire folder:
python align.py --source_folder={PATH_TO_IMAGES} --aligned_folder={OUTPUT_PATH} --align_mode={ALIGN_OPTION}All alignment command line parameters can be found in this argument file.
You may want to convert your images to a common .jpg or .png format. To do this for a single file or recursively on a large amount of images, you can use the following script:
For a single image:
python convert.py --source_image={PATH_TO_IMAGE} --output_folder={OUTPUT_PATH} --output_extension={EXTENSION_OPTION}Or for an entire folder:
python convert.py --source_folder={PATH_TO_IMAGES} --output_folder={OUTPUT_PATH} --output_extension={EXTENSION_OPTION}Where EXTENSION_OPTION is either .png or .jpg.
All conversion command line parameters can be found in this argument file.
To train GhostV2, you can run the following script:
python train.pyWe provide a lot of different options for training.
Internally, we detect faces using the Pytorch RetinaFace model. We then compute face embeddings using one of the available face recognition models:
- The original ArcFace model, used by the initial version of Ghost (beware, the model is available for non-commercial research purposes only),
- AdaFace, a concurrent model of ArcFace,
- CVLFace, by the author of AdaFace, proposing various face recognition models,
- Facenet Pytorch, the Pytorch version of David Sandberg's tensorflow facenet.
By default we use ViT AdaFace, which apparently gives the best results, especially in terms of identity preservation.
More information regarding each option can be found in this argument file. If you want to use wandb logging for your experiments, you should login to wandb first --wandb login.
N.B.: The --example_images_path must points to a folder containing test images cropped using the alignment method used to generate your training dataset.
It is possible to calculate the distance between embeddings computed using distinct face recognition models or distinct face alignment modes or both. This is useful if you want to know whether you can replace a face recognition model or face alignment algorithm with a given face swap model.
To do this, you can run the following script:
python embedding_distance.pyAnd play with the --source_face_embeddings, --target_face_embeddings, --source_crop_size, --target_crop_size, --source_align_mode and --target_align_mode parameters.
All command line parameters can be found in this argument file.
Our pretrained model was trained on a single RTX 4090 card using FP16 mixed precision and the Laion-Face dataset preprocessed as explained in the Dataset Preprocessing section above (around 300000 faces, each one with 10 distinct facial expressions, using insightface_v2 as aligment algorithm) and the CVL ViT face embedding model.
It consisted of two phases:
- A first run of 4 epochs (~20 hours) with a batch size of 32 and default parameters set in the training arguments file (no scheduler),
- A second run of 1 epoch (~20 hours as well) with a batch size of 16 (due to the 24GB memory limit of the RTX 4090 card),
--eye_detector_lossenabled,--weight_id=70and--weight_eyes=1200, and using the G and D files of the previous run. We also use a scheduler for both the G and D models by setting the--use_schedulerflag and the default scheduler parameters of the training arguments file.
- In case of finetuning you can variate losses coefficients to make the output look similar to the source identity, or vice versa, to save features and attributes of target face.
- You can change the backbone of the attribute encoder and num_blocks of AAD ResBlk using parameters
--backboneand--num_blocks. - During the finetuning stage you can use our pretrain weights for generator and discriminator that are located in the
weightsfolder. We provide the weights for models with U-Net backbone with 2 blocks in AAD ResBlk.
The output are not as good as InsightFace and post-processing is needed to achieve the best results.
Currently, we propose 2 optional post-processing steps :
- Face restoration using GFPGAN v1.4 (or v1.2),
- Face edge inpainting using diffusers SDXL inpainting model
Here are comparisons with and without post-processing:
- Without face restoration and without face edge inpainting:
- Without face edge inpainting but with face restoration:
- Without face restoration but with face edge inpainting:
- With face restoration and face edge inpainting:
This project can still be improved. Here is a list of known topics:
- Add video face swap as in original Ghost repository.
- Use Pytorch Lightning CLI to train the model using various configurations.
- Create an onnx version of the pretrained model.
- And of course, improve the face swap result!
The pretrained models and source code of this repository are under the BSD-3 Clause license.
| file | source | license |
|---|---|---|
| GhostV2 Discriminator | dimitribarbot/ghostv2 | |
| GhostV2 Generator | dimitribarbot/ghostv2 |
The models and code used in this repository are:
The datasets used in this project are:
| dataset | source | license |
|---|---|---|
| LAION-Face | FacePerceiver/LAION-Face | |
| Lagenda | WildChlamydia/Lagenda |
Thanks to everyone who makes this project possible!




