Deep Fakes: Pushing the Limits of Visual Perception
Print this Article | Send to Colleague
Digitally altered photos and videos have already had devastating impacts around the globe. Detecting and exposing these fakes may well be essential to our democracy and our personal safety. UC Berkeley professor of vision science Hany Farid, one of the world’s foremost experts in digital forensics and image analysis, is developing software that can detect fakes, but the sheer volume of altered images being posted online presents an almost impossible task.
We used to go to libraries. That’s how old I am,” says Dr. Hany Farid, Professor of Vision Science at UC Berkeley’s School of Optometry, as he recounts the old-school way he became one of the world’s foremost experts in the ultra-modern field of digital forensics and image analysis. While waiting in the checkout line he picked up a book—a 500-page insomnia cure called The Federal Rules of Evidence — that was languishing on a return cart. He flipped through randomly, landing on a page entitled “Rules for Introducing Photographs Into Evidence.” Farid was studying photo imagery as a postdoc, so his curiosity was piqued. “It was almost a footnote,” he recalls. “And this was in 1997, so film still dominated, though digital cameras were just coming around. But the book said that for the purposes of the federal court system, they were treating a digital photograph and a 35 mm negative as exactly the same. And I thought, huh…I’m not really good at predicting the future, but this is going to be a problem.” With digital photography still in its infancy, Farid’s advisor was less than enthusiastic when Farid suggested that he dig into the question of how photos could be conclusively authenticated.
“But the idea kind of sat with me,” says Farid, who, in addition to his work at Berkeley Optometry, has joint appointments in Electrical Engineering & Computer Science and the School of Information. “And I had this tennis buddy at the time and he was always beating me, so I hated him.” Using an early version of Photoshop, Farid took an image of his friend and swapped in the face of a famous tennis pro. “And when I did that manipulation, I realized that it was going to leave behind an artifact that I could quantify and measure.” He remembered the vagueness he’d noted in the rules of evidence and realized that any photo manipulation would leave a specific statistical signature, a fingerprint that could bring photographic verification into the modern era.
Fast forward twenty years, and every person on earth is bombarded by thousands of photo and video images every day. Seeing is believing, as the adage goes, but what does that mean when anybody with a laptop can make hyper-realistic alterations that are undetectable to the casual internet user scrolling and swiping her way through the day’s news? “Stalin manipulated photographs,” Farid says. “Hitler did it. Mao did it. There’s power in visual imagery. You change images and you change history.” With people already shouting “fake news!” about anything they merely disagree with, the potential for governments (or basement-dwelling provocateurs) to produce compelling fake images is chilling. Digital forgeries are already in circulation and will inevitably become more convincing and widespread. Fake news won’t just be a rallying cry, but a conscious political or criminal technique with immense power.
The human eye and cognitive system are not particularly good at detecting fakes. And why should they be? A tree is a tree and a tiger is a tiger; there was no selective pressure on our prehistoric ancestors to guard against the natural world launching forgeries at them. “What do you actually care about, visually?” asks Marty Banks, Professor of Optometry and Vision Science at Berkeley Optometry. “You want to know where the door is so you can run out of it if you need to. And you want to know who I am when you’re sitting next to me. But you don’t care if the light is coming from overhead or outside.” Banks gestures towards a window and says “the light out there is probably a thousand times greater than it is in here, but to your eye it looks maybe twice as strong. And who cares? All you need to know is where the building is so you don’t accidentally run into it. You don’t care where the light is coming from or what the shadows are doing. Visual processes have evolved to be very good at what and where and then we just ignore these nuisance variables.”
“We don’t go around understanding the physics of light and shadows and reflections and perspective geometry,” Farid says, echoing Banks. “We get ourselves around the world in a safe way, but when it comes to analyzing images we have to ask questions like, ‘are the shadows consistent? Is the geometry consistent? Are the physics consistent?’” Farid’s research shows that people are pretty terrible at distinguishing the real from the fake. “A huge proportion of the human brain is dedicated to visual processing and from an early age we learn about the world through visual imagery,” he says. “Surely when we’re presented with a fake image, you’d think people would notice. But no. We add all kinds of inconsistencies to photos and people just can’t tell.” A forger doesn’t need to get every detail perfect; he just needs to make his fake grossly good enough to look momentarily convincing as it zooms through our social media feeds alongside cat videos and vacation pictures.
There are a few different categories of fakes, Farid explains, the easiest of which — a misattribution fake — requires no technical skill whatsoever. “This is where somebody takes a photo of a bombed-out building from Syria five years ago and captions it that this just happened in Afghanistan.” Misattribution fakes are widespread and have been used to substitute one border crossing for another, inflate or deflate crowd sizes at protests or political rallies, or even convince people that a photo of a natural disaster from one part of the world was actually taken thousands of miles away.
“With a deep fake the manipulator can create a realistic video likeness of a person and then make that facsimile say or do something that the real person never would.”
Moving up the scale of complexity is the kind of photo alteration easily done on Photoshop: replacing one person’s face with another, creating a composite of two people standing next to each other when in reality they’ve never met. This is great when you want to put Uncle Fred into the family reunion photo that he was late for, but takes a darker turn when used by political opportunists, like whoever altered a photo of a young Barack Obama to make him appear as a machine-gun toting member of the Black Panther Party. As far back as 2006, Reuters fired a photographer after he used Photoshop to enhance smoke effects, making a bombing in Beirut look worse than it was. More recently, somebody with just a modicum of technical skill took a video of House Majority Leader Nancy Pelosi, and slowed it down by 75%. The resulting clip of her, seemingly drunk and slurring her words, was shared on social media hundreds of thousands of times.
Which brings us to the future of digital forgery: deep fakes. With a deep fake—technically known as Artificial Intelligence Synthesized Content—the manipulator can create a realistic video likeness of a person and then make that facsimile say or do something that the real person never would. A widely-known politician such as Donald Trump or Hillary Clinton has thousands of hours of publicly available video and audio of how they look and speak; using those real images as a database, forgers build a library of facial movements, speech patterns, and hand gestures that can later be forced into service in the absence of the original human. “Let’s say I want to create a fake where I replace your face with my face,” Farid says. “A synthesizer algorithm generates the image and then a detector algorithm looks it over in what’s called a Generative Adversarial Network.” The detector tells the synthesizer if it can distinguish the fake image from the real images, forcing the synthesizer to create an increasingly more realistic image “really rapidly, to the tune of tens of millions of iterations and then finally the detector is satisfied and now you have highly realistic fake content.”
If your goal is to make a Hollywood blockbuster, then this technique is a goldmine. (To get an idea of the possibilities, just search out Nicolas Cage deep fakes on YouTube, and you’ll fall down a rabbit hole dug by the oddball community trying to put Cage into every movie role imaginable. He does a particularly good Indiana Jones and he’s unsettlingly decent as the Julie Andrews character in Sound of Music.)
But the negative consequences of deep fakes are potentially catastrophic. Imagine a presidential candidate digitally “confessing” to treason or a technologically savvy stock market short-seller releasing a video of Jeff Bezos announcing his surprise retirement from Amazon. On a more local scale, non-consensual pornography is already a reality, a nightmare version of Farid’s harmless prank with his tennis partner, in which an angry boyfriend swaps the face of a woman he knows onto an adult performer’s body, then terrorizes her by sending the resulting fake video clip to her parents or her employer.
Fake news clips have already had devastating impacts around the globe. After a real attack by militants in Sri Lanka, faked social media posts incited retaliatory violence against innocent people. A faked video on WhatsApp contributed to sectarian violence in India that led to the mob killings of sixty people. And, as Farid points out “whether you like him or not, 80,000 votes in three states was the difference between Trump and Clinton and we know that millions of people saw fake news from Russia. You don’t think that could have affected 80,000 votes?”
Detecting and exposing these fakes may well be essential to our democracy and our personal safety. “I like the idea of a Good Housekeeping Seal of Approval” for photos and videos, says Dr. Banks. “You go through the whole pipeline of how an image was captured, encoded, and processed. If we go through and check for landmarks, then we can stamp it authentic if the image obeys certain rules.” Banks ticks off a list of pitfalls for potential forgers: light reflecting off of eyes, geometry of shadow casting, compression of focal lengths, watermarks left behind when pixels in a JPEG are rearranged. In theory, internet users could be trained to trust only images that are verified by a reliable source, much the way we’ve come to look for the organic label on vegetables.
Which all sounds great, but “forensically examining a billion images a day is impossible,” Farid explains. “I can sit at my desk for a couple of hours and analyze a video and tell you if it’s real. But it doesn’t work at scale, when you’ve got milliseconds to do the analysis.” The problem is that nearly all pictures are altered—cropped, color-corrected, red-eye-reduced—so the issue is in pointing out whether the alteration is harmless or nefarious.
Moreover, every safeguard enacted by scholars like Farid and Banks will immediately be reverse-engineered and eclipsed by forgers. “I’m going to lose,” Farid says. “Because playing defense is always harder than playing offense.” Well-resourced government hackers in Russia and China will always be able to win the digital arms race, “and so my job,” Farid continues “is to make sure that it’s really a small number of people in the world who can do this well. Because when any knucklehead on Reddit can do this, then we really have a problem.”
In Farid’s opinion, the solution is for the corporate titans of the digital age—Google, Facebook, Apple, etc. — to take up the moral responsibility of becoming good global citizens. For example, the built-in cameras on cell phones could use “control capture systems,” in which every photo is stamped with a mathematical signature which would then be encrypted and uploaded to the blockchain, an immutable distributed ledger against which the photo could forever be verified. Facebook could, if they wanted, only promote material that is verified and authenticated, which would encourage people to post trusted content if they want their posts to be more visible on their follower’s feeds. But would Facebook agree to more accuracy if it caused content to become more boring, less clickable, less profitable? “The companies all say ‘well, this is a really hard problem,’” Farid says. “But you know what? They’ve solved a lot of hard problems over the years, so don’t tell me this is too hard.”
Twenty-plus years ago, when Hany Farid picked up an actual book from an actual shelf, it probably never crossed his mind that something titled Federal Rules of Evidence would be anything other than the genuine rules that govern evidence in federal courts. And why should he have been suspicious? But today, says Farid, “over half of the content you see online is either generated by bots or is simply not true.” In a world where everything has the potential to be fake, the entire notion of truth and trustworthiness goes out the window. While long-term solutions may be elusive, there are some immediate steps each of us can take to protect ourselves from the society’s forgery-inspired downfall: “Delete Facebook and Twitter immediately. It’s good for the world and you’ll also be a lot happier.” And, Farid says, half-jokingly, “stockpile food and water.”
-30-