Extracting audio from video information

This story is a bit old, but it was orphaned in one of my browser tabs. This is some grade-A sci-fi hocus pocus:

Researchers at MIT, Microsoft, and Adobe have developed an algorithm that can reconstruct an audio signal by analyzing minute vibrations of objects depicted in video. In one set of experiments, they were able to recover intelligible speech from the vibrations of a potato-chip bag photographed from 15 feet away through soundproof glass.

In other experiments, they extracted useful audio signals from videos of aluminum foil, the surface of a glass of water, and even the leaves of a potted plant.


In the experiments reported in the Siggraph paper, the researchers also measured the mechanical properties of the objects they were filming and determined that the motions they were measuring were about a tenth of micrometer. That corresponds to five thousandths of a pixel in a close-up image, but from the change of a single pixel’s color value over time, it’s possible to infer motions smaller than a pixel.

Suppose, for instance, that an image has a clear boundary between two regions: Everything on one side of the boundary is blue; everything on the other is red. But at the boundary itself, the camera’s sensor receives both red and blue light, so it averages them out to produce purple. If, over successive frames of video, the blue region encroaches into the red region — even less than the width of a pixel — the purple will grow slightly bluer. That color shift contains information about the degree of encroachment.

In recent years, researchers have developed methods for detecting heart rate of people purely through video, using small fluctuations in the color of their skin.

What will happen when we are naked to computer vision? What if we can no longer hide when our heart starts racing, our skin flushes, or our hand quivers ever so subtly? We always thought we'd be the one administering the Voight-Kampff test to cull the replicants from the humans, but maybe it's the reverse that arrives first. Machines just sitting their motionless, staring at us, and seeing everything.