Friday, February 15, 2013

What's Happening?

Each of the above videos (Toy Story 1 on the left, Toy Story 2 in the middle, and Toy Story 3 on the right) were produced by speeding up the original movie, then blending each frame of the video with many of its neighbors -- other frames forward and backward in the original movie. This concept is illustrated in a simplified form in the following image:

In the above image, you can see 5 frames from the original movie being added together. This illustrates that the frames weren't combined equally -- each frame is blended more with its near neighbors than with faraway neighbors, which is what yielded the smoothness of the resulting video.

The Process In Detail

The exact process we used to produce the above videos involved a number of steps using ffmpeg/avconv and python with scipy.

1. First, we extracted 1 frame per second from the original movie, using the following command:
avconv -i path/to/toystory1.avi -vsync 1 -r 1 -f image2 -an -y output/image_%07d.png
For Toy Story 1, this resulted in 4866 images in the output folder. Toy Story 2 yielded 5548 images and Toy Story 3 amounted to 6155 images.
2. Next, we ran our python program (source code), which blends each frame with its neighbors and saves an image of the result. The blending operation is a weighted sum, where each frame is added with 150 neighboring frames on either side (forward and backward in the movie), weighted by the neighboring frames' distances (i.e. - the current frame has a weight of 150, the frames immediately on either side of the current frame each have a weight of 149, the two frames flanking those frames have a weight of 148, etc. etc. until the neighboring frames that are 150 frames away from the current frame have a weight of 0). This weighted sum of 301 frames is normalized and saved as a single frame of the output video. (When we talk about the weighted sum of images, we mean the RGB values for a pixel of the output image are the sums of the RGB values of the pixels at that same location in every other frame, with these RGB values multiplied by that frame's weight.)
3. After the python program finishes, there should be a blended output frame for every input frame. Combine these blended frames into a movie with the following command:
avconv -f image2 -r 30 -i path/to/python/script/output/frame%04d.png -filter:v setpts=0.5*PTS -vcodec libx264 output/toystory1.mp4
The setpts option determines the speed of the resulting video. A multiplier of 1.0 ("setpts=1.0*PTS") is normal speed, whereas a multiplier of 0.5 is double speed and a multiplier of 2.0 is half speed. We varied this coefficient for each video so that the result would be 90 seconds long, to produce videos of the same length for all three movies.