FakeCatcher: Detection of Synthetic Portrait Videos using Biological Signals

Umur Aybars Ciftci1, Ilke Demir2, Lijun Yin1

1Binghamton University, 2Intel Corporation

TeaserImage

We present a novel approach to detect synthetic content in portrait videos, as a preventive solution for the emerging threat of deep fakes. In other words, we introduce a deep fake detector. We observe that detectors blindly utilizing deep learning are not effective in catching fake content, as generative models produce formidably realistic results. Our key assertion follows that biological signals hidden in portrait videos can be used as an implicit descriptor of authenticity, because they are neither spatially nor temporally preserved in fake content. To prove and exploit this assertion, we first exhibit several signal transformations for the pairwise separation problem, achieving 99.39% accuracy. Second, we utilize those findings to formulate a generalized classifier for fake content, by analyzing proposed signal transformations and corresponding feature sets. Third, we generate novel signal maps and employ a CNN to improve our traditional classifier for detecting synthetic content. Lastly, we release an "in the wild" dataset of fake portrait videos that we collected as a part of our evaluation process. We evaluate FakeCatcher both on Face Forensics dataset and on our new Deep Fakes Dataset, performing with 96% and 91.07% accuracies respectively. In addition, our approach produces a significantly superior detection rate against baselines, and does not depend on the source, generator, or properties of the fake content. We also analyze signals from various facial regions, with varying segment durations, and under several dimensionality reduction techniques.

ExampleImagesReal
Frames from real videos (left) and deep fakes (right) are demonstrated for a small subset of our Deep Fakes Dataset.

Deep Fakes Dataset

In order to assess the generalizability of our solution against deep fakes, we need to evaluate our approach on everyday deepfake samples. For this purpose, we collected and curated a dataset of “in the wild” portrait videos, called Deep Fakes Dataset. The videos in our dataset are diverse real-world samples in terms of the source generative model, resolution, compression, illumination, aspect-ratio, frame rate, motion, pose, cosmetics, occlusion, content, and context. They originate from various sources such as news articles, forums, apps, and research presentations; totaling up to 142 videos, 32 minutes, and 17 GBs. Synthetic videos are matched with their original counterparts when possible. The visuals below demonstrates a small subset of our dataset. High accuracy on Deep Fakes Dataset substantiates that FakeCatcher is robust to all aforementioned artifacts found in deepfakes in the wild. The dataset is publicly released for academic use.

Data Request

Deep Fakes Dataset is released under the Deep Fakes Academic Use License Agreement.

In order to download Deep Fakes Dataset, please fill out the following Google form after reading and agreeing our License Agreement. Upon acceptance of your request, the download link will be sent to the provided e-mail address. For any questions or feedback, please e-mail Umur Ciftci, Ilke Demir, or Lijun Yin.



Related Papers

FakeCatcher

We would like to present an approach to detect synthesized content in the domain of portrait videos, as a preventive solution for deep fakes. Our approach exploits biological signals extracted from facial areas based on the observation that these signals are not well-preserved spatially and temporally in synthetic content. We evaluated FakeCatcher both on a state-of-the-art datasets and on our newly introduced Deep Fakes dataset, performing with 96% and 91.07% accuracy respectively.


Deep Fake Source Detection

We propose an approach not only to separate deep fakes from real videos, but also to discover the specific generative model behind a deep fake. Pure deep learning based approaches try to classify deep fakes using CNNs where they actually learn the residuals of the generator. We believe that we can reveal these manipulation artifacts by disentangling them with biological signals. Our key observation yields that the spatio-temporal patterns in biological signals can be conceived as a representative projection of residuals. Our results indicate that our approach can detect fake videos with 97.29% accuracy, and the source model with 93.39% accuracy.


Eye & Gaze Based Deep Fake Detection

In this paper, we first propose several prominent eye and gaze features that deep fakes exhibit differently. Second, we compile those features into signatures and analyze and compare those of real and fake videos, formulating geometric, visual, metric, temporal, and spectral variations. Third, we generalize this formulation to the deep fake detection problem by a deep neural network, to classify any video in the wild as fake or real. We evaluate our approach on several deep fake datasets, achieving 92.48% accuracy on FaceForensics++, 80.0% on Deep Fakes (in the wild), 88.35% on CelebDF, and 99.27% on DeeperForensics datasets.