This article is part of a special issue of VB. Read the full series here: AI and Security.
The number of diphakes – media that take an existing photo, audio or video and replace the person’s personality on it with someone else’s using AI – is growing rapidly. This is worrying not only because such fakes can be used to influence people’s opinions during elections or to entangle someone in crimes, but also because they have already been abused to create fake porn and deceiving the director of a british energy company.
Anticipating this kind of new reality, the union of academic institutions, technology firms and nonprofits is developing ways to identify misleading media generated by AI. Their work shows that detection tools are only a short-term viable solution, while the diphtheic arms race is just beginning.
Previously, the best prose created by AI was rather close to the texts from the game. Mad libsthan to the novel “Bunches of Wrath”, but modern language models can now write texts that are close in presentation and persuasiveness to those written by a person. For example, a model GPT-2Launched by San Francisco-based OpenAI research firm in seconds, creates fragments in style New Yorker’a articles or scripts for the Brainstorming game. Researchers The Middlebury Institute’s Center for Terrorism, Extremism and Counter-Terrorism suggested that the GPT-2 and other similar models could be set up to advocate the superiority of the white race, jihadi Islamism and other threatening ideologies – and this raises even more concerns.
Above: Frontend GPT-2, a trained language model from research firm OpenAI.
Image Credit: OpenAI
In search of a system capable of detecting synthetic content, researchers at the Paul G. Allen School of Computer Science and Engineering at the University of Washington and the Allen Institute of Artificial Intelligence developed Grover – an algorithm that they claim was able to select 92% of the diphakes in a test suite made up of open Common Crawl Corpus data. The team attributes its success to Grover’s copywriting approach, which they said helped to understand the features of the language created by AI.
A team of scientists from Harvard and MIT-IBM Watson AI Lab separately released The Giant Language Model Test Room, a web environment that attempts to determine if text was written using an AI model. Given the semantic context, she predicts which words are most likely to appear in a sentence, essentially writing her own text. If the words in the sample being tested correspond to 10, 100 or 1000 most likely words, the indicator turns green, yellow or red, respectively. In fact, she uses her own predictable text as a guideline for identifying artificially generated content.
Modern AI, generating video, is just as dangerous and has the same, if not great, capabilities as its natural counterpart. AT academic articlepublished by Hong Kong-based startup SenseTime, Nanyang University of Technology and the Institute of Automation of the Chinese Academy of Sciences, describes in detail a framework that edits footage using audio to synthesize realistic videos. And researchers from Hyperconnect in Seoul recently developed a tool MarioNETtewho can control the facial features of a historical figure, politician or CEO, synthesizing a recreated face animated by the movements of another person.
However, even the most realistic dipfakes contain artifacts that issue them. “Dipfakes created by generative systems study a set of real images in a video, to which you add new images, and then generate a new video with new images,” says Ishay Rosenberg, head of the deep training group at the cybersecurity company Deep Instinct. “The resulting video is slightly different as a result of changes in the distribution of artificially generated data and in the distribution of data in the original video. These so-called “glimpses in the matrix,” are what the diphtheic detectors are capable of distinguishing. “
Above: two fake videos created using the most advanced techniques.
Image Credit: SenseTime
Last summer, a team from the University of California at Berkeley and the University of Southern California prepared model to search for exact “units of face action” – data on facial movements, ticks and expressions, including when raising the upper lip and turning the head when people frown — to identify fake videos with an accuracy of more than 90%. Similarly, in August 2018, participants in the Media Forensics Program of the US Defense Advanced Research Projects Agency (DARPA) tested the systemcapable of detecting AI-generated video based on such signs as unnatural blinking, strange head movements, unusual eye color and much more.
Several startups are currently in the process of commercializing similar tools for detecting fake video images. Amsterdam Laboratory Deeptrace labs It offers a set of monitoring tools, the purpose of which is to classify dipfakes uploaded to social networks, video hosting platforms and disinformation networks. Dessa suggested methods for improving fake detectors trained to work with fake video datasets. And in July 2018, the company Truepic raised $ 8 million to finance its service for the deep detection of fakes in video and photos. In December 2018, the company acquired the startup Fourandsix, whose counterfeit image detector received a DARPA license.
Above: Dipfake images edited by AI.
In addition to developing fully trained systems, a number of companies have published text corps in the hope that the research community will develop new methods for detecting fakes. To accelerate this process, Facebook, along with Amazon Web Services (AWS), Partnership on AI, and academics from several universities, led the Deepfake Detection Challenge. The program includes a set of data on samples of video materials with tags indicating that they were affected by artificial intelligence. In September 2019, Google released collection of visual fakes as part of the FaceForensics test, which was created by the Technical University of Munich and the University of Naples Federico II. And most recently, researchers from SenseTime, together with Nanyang University of Technology in Singapore, developed DeeperForensics-1.0, a dataset for detecting fakes of the face, which they claim is the largest of its kind.
AI and machine learning not only suitable for synthesizing video and text, they can also copy voices. Countless research showed that a small data set is all that is required to recreate a person’s speech. Commercial systems such as Resemble and Lyrebird, this requires a few minutes of audio recordings, while sophisticated models, such as the latest Baidu Deep Voice implementation, can copy a voice from just a 3.7-second sample.
There are not so many tools for detecting audio diphakes, but solutions are starting to appear.
A few months ago, the Resemble team released an open-source tool called Resemblyzer, which uses AI and machine learning to detect dipfakes by acquiring high-level voice samples and predicting whether they are real or simulated. After receiving an audio file with speech, he creates a mathematical representation summarizing the characteristics of the recorded voice. This allows developers to compare the similarity of the two votes or find out who is talking at the moment.
In January 2019, as part of the Google News Initiative, Google released a speech corpus containing “thousands” of phrases spoken using text-to-speech models. Samples were taken from English articles read by 68 different synthetic voices with different dialects. The housing is available to all participants. ASVspoof 2019, a contest whose purpose is to promote countermeasures against fake speech.
Much to lose
None of the detectors has achieved perfect accuracy, and researchers have not yet figured out how to identify fake authorship. Deep Instinct Rosenberg expects this to inspire bad actors to spread fakes. “Even if a dipfake created by an attacker is detected, only the dipfake risks being disclosed,” he said. “For an actor, the risk of being caught is minimal. Because the risk is low, there are few constraints against creating fakes. ”
Rosenberg’s theory is confirmed Deeptrace Reportwho found 14,698 fake videos online during his last count in June and July 2019. Over a seven-month period, their number increased by 84%. The vast majority of them (96%) are pornographic videos featuring women.
Given these figures, Rosenberg argues that companies that “lose a lot” due to diphakes should develop and implement deep detection technology in their products, which, in his opinion, is close to antivirus programs. And in this area shifts have appeared; Facebook announced in early Januarythat will use a combination of automated and manual systems to detect fake content, and Twitter recently suggested mark dipheyki and delete those that can cause harm.
Of course, the technologies underlying the generation of dipfakes are just tools, and they have great potential for good deeds. Michael Klozer, head of Data & Trust at Access Partnership, a consulting company, said the technology is already being used to improve medical diagnostics and cancer detection, fill gaps in the mapping of the universe, and improve the training of unmanned vehicles. Therefore, he warns against the use of general campaigns to block generative AI.
“Since the leaders began to apply existing legal norms in cases of diplomatic, it’s very important now not to get rid of valuable technologiesgetting rid of fakes, ”said Klozer. “Ultimately, case law and social norms regarding the use of this new technology are not ripe enough to create bright red lines that delineate fair use and abuse.”