Project #MovingStills Turns Still Photos into Video
Long Mai demonstrates Project #MovingStills on stage at Adobe MAX 2018.
Take a snapshot and you capture a frozen moment in time. It’s been the preferred way of preserving memories for decades. Pictures are powerful and effective at capturing emotion. So, it’s no surprise that almost everyone carries a smartphone with a camera and it feels like photography is everywhere.
And yet, we also live in a world of ubiquitous video — an environment where photography can seem flat by comparison. It poses a dilemma for marketers and creatives. Well shot video is especially engaging, but it’s also more difficult and expensive to produce, procure, edit and distribute. What if there were a way to take the simplicity and convenience of photography and enhance it with the visual excitement and motion of video?
https://www.youtube.com/watch?v=dM-lX9c3Pqw
That’s the idea behind Project #MovingStills, a Sneak technology from Adobe. “Our goal was simple. We wanted to figure out how we could take the still image and make it more visually interesting and engaging to watch,” says Long Mai, a research scientist at Adobe. #MovingStills uses AI technology powered by Adobe Sensei to generate extremely realistic, video-like motion across — or into — a still photo. The work was done in collaboration with Long Mai, Jimei Yang, and Simon Niklaus.
Long Mai notes that it’s not the first effort to add motion and visual interest to photographs. In the pre-digital era, documentarian Ken Burns famously pioneered the technique of adding motion to historical photographs by passing the camera over the image, adding a gentle pan or zoom for visual interest.
More recently, a technique known as 2.5D parallax became popular, allowing creatives using Adobe Photoshop to isolate objects in a photo and generate differentiated movement between foreground subjects and background, creating a layered depth illusion.
What makes #Moving Stills unique is that it uses AI to estimate and recreate the actual 3D geometry and depth map of a scene, then synthesizes a camera path through it. The effect is astoundingly realistic motion video, generating new pixels for parts of the scene hidden behind occlusions and revealed as the camera moves, or adding life-like perspective to depth changes as the camera moves forward or backwards in a scene.
“It was a very challenging problem to solve,” Long Mai explains. “In order to train AI to generate a predictive depth model, we needed to be able to construct a data set that had thousands of views of different scenes from different angles. The breakthrough came when we realized we could use a 3D rendering engine (traditionally used for creating video games) to create our own virtual scenes and then walk through them with a virtual camera, taking pictures of everything from different angles. It provided the extensive data we needed.”
At Adobe MAX 2018, Long Mai showed the audience how #MovingStills could generate video-like camera movement with portraits, landscapes and interiors as well as how the technology could be paired with controls allowing creatives to determine camera path movement through the scene, or to generate moving slideshows from a collection of images with just a click.
#MovingStills is a technology that promises to supercharge photography, engage audiences and bring a renewed sense of life and depth to historical images, but Long Mai still isn’t satisfied. In part that’s due to his experience with previous projects.
He first started working with Adobe in 2016 as a research intern and joined Adobe full time in 2017. His first research project involved working on #ConceptCanvas, a technology for adding creative visual and spatial search features to Adobe Stock. “It gave me a new appreciation for the complexity of spatial relationships in a given image, and the level of specificity and control creatives need in their workflow,” he says.
Those same lessons guide his goals for #MovingStills. First, he wants to optimize the technology to work faster. “Depth estimation is still a really challenging process and it takes a lot of GPU power and time to build the model. We want to get to a place where it can work in seconds, rather than minutes,” he explains.
He’d also like to add more specific control to the camera path and features for animating other aspects of the scene. “We want to move from three degrees of freedom to six degrees of freedom within the generated scene. If we can add some more parallax effect and animate features like water, clouds, leaves or human hair, we can potentially start to generate realistic 3D environments for virtual reality or other immersive technologies just by starting with a 2D photo,” he muses.