Friday, March 7, 2014

LED Projection Mapping with Webcams (and Math)

In my last post, I covered the initial steps to setting up an LED projection system that can handle arbitrary LED locations. The LEDs were all constrained to a single strip, mostly because I can't commit to dedicating the LEDs to any single project and cutting the strip up. I wrapped the strip around a cylinder and was able to project a few images and a video with reasonable success. The biggest problem was the imperfect placement of the LEDs.

My projection system depends on knowing the 3D position of every LED in the display in order to properly project an image. Using simple math to describe the positions like I did before only works if the LEDs are placed very accurately. To allow for projections on to an LED strip that is not placed on any sort of simple mathematical curve, I've worked out a method of automatically locating every LED in 3D space using some webcams and some more math.

Single-Camera Mapping

To start, let's look at how a webcam can be used to recognize an LED.

An appropriately messy pile of LEDs.

In an image recorded from the camera, the strip appears as a mostly white strand with very small non-white regions denoting the location of an LED. Not a great situation for automatic feature detection. What happens when we turn on an LED? This is done by sending the appropriate commands to my Arduino LED strip controller to turn on exactly one LED.

Now we have a bright spot showing us the location of a single LED. To automatically locate this LED, we could write an algorithm to search through the camera image to find the brightest spot and return its position. Unfortunately, this kind of method is easily confused by anything else in the camera image that is bright. A way to fix this is to subtract off the image with the LED off to eliminate anything that hasn't changed between the two frames.

Now all we have left is a single bright point so search for. Using OpenCV to handle capturing and storing the webcam images, the location of the bright spot can be located as such:

Show/Hide Code

These positions are not the full 3D positions of each LED that we will eventually need, but just the projection of each LED position on to the webcam field of view. We can loop through each LED in the strip and determine this projected position:

You might notice that when an LED lights up a significant portion of the paper the strip is taped to, the LED-finding algorithm picks a point somewhere between the actual LED and the lit up paper. This is because the algorithm is finding the center of light in the image as opposed to the peak of light. I feel this is more appropriate in this case due to the fact that some of the LEDs are pointed away and will only be seen in the light they shine on the paper. Instead of treating those LEDs as lost, I might as well register the lit up paper as the source of light from the viewers perspective and map it appropriately.

This is enough information to perform a projection mapping that is only viewable from the perspective of the webcam.

This single-camera method is not able to determine the distance from the camera to any of the LEDs. All it can do it tell you the projected position of each LED on the image plane. In order to generalize this projection mapping, we need some method of supplementing the 2D position information provided so far with an estimate of the camera-to-LED distance.

Multi-Camera Mapping

Consider a single camera looking at a single lit LED. The LED position on the camera image gives us the angle relative to the direction the camera is pointing that the LED is located. From the LED position (as determined in the above algorithm), we can work out a vector line that originates from the camera and hits the LED at some point:
\[ \vec{f} = \vec{c} + t\hat{\vec{n}} \]
Here, $\vec{c}$ is the position of the camera, $\hat{\vec{n}}$ is the normal vector originating from the camera and pointing towards the LED, and $t$ is the vector line distance parameter. The issue here is that we don't know how far along this vector the LED sits, or rather, what the appropriate value of $t$ is. But suppose we add a second camera that can see the single LED but is positioned elsewhere. Again we can define a vector line originating from this new camera that hits the LED. If we know the positions and orientations of the two cameras relative to each other, we know the equations for each line in a common coordinate system.
\[ \vec{f_1} = \vec{c_1} + t\hat{\vec{n_1}} \]
\[ \vec{f_2} = \vec{c_2} + s\hat{\vec{n_2}} \]
Ideally, these two lines will intersect ($\vec{f_1} = \vec{f_2}$) exactly where the LED exists in 3D space. Solving for this intersection point using some high school math gives us this point. In this way, we can use two cameras to determine the 3D position of each LED.

Unfortunately, these two vector lines will rarely intersect. Due to imperfections in the camera positions, the measurements thereof, and the camera optics, the two vector lines will most often be skew. Instead of finding the point of intersection, we need to find the closest point between these two lines. To do this, we need to find the values for $s$ and $t$ that minimize the distance between the two lines. One way to solve for these values is to write out the distance between the two lines for any ($s$,$t$) and set its derivative with respect to each quantity equal to zero. This is a pain to write out, so here's an easier way of doing it.

First we define the distance vector:

\[ \vec{d}(s,t) = \vec{f_1} - \vec{f_2} = \vec{c_1} - \vec{c_2} + t\hat{\vec{n_1}} - s\hat{\vec{n_2}} \]

We want to know the values of $s$ and $t$ that minimize the length of this vector. We also know from geometry that when this vector distance is minimized, it will be perpendicular to both original lines. We can express this by saying that there will be no component of the distance line along the original lines:

\[ \hat{\vec{n_1}} \cdot \vec{d}(s,t) = 0 \]
\[ \hat{\vec{n_2}} \cdot \vec{d}(s,t) = 0 \]

Expanding these out and expressing the two equations with two unknowns as a matrix problem,

\[ \left( \begin{array}{cc}
\hat{\vec{n_1}} \cdot \hat{\vec{n_1}} & -\hat{\vec{n_1}} \cdot \hat{\vec{n_2}} \\
\hat{\vec{n_2}} \cdot \hat{\vec{n_1}} & -\hat{\vec{n_2}} \cdot \hat{\vec{n_2}} \end{array} \right)
\begin{pmatrix} s \\ t \end{pmatrix} =
\begin{pmatrix} -\hat{\vec{n_1}} \cdot (\vec{c_1} - \vec{c_2}) \\ -\hat{\vec{n_2}} \cdot (\vec{c_1} - \vec{c_2}) \end{pmatrix}

The diagonal of the 2x2 matrix can of course be simplified to 1 and -1. Solving this matrix problem for $s$ and $t$ allows us to compute where on each original vector line the distance vector hits. Since I have no reason to favor one camera over the other, I assume the LED is most likely located half-way between these two points.

With the matrix inversion expanded out, the code to find the LED based on input vectors looks like this:

This of course depends on having a decent estimate of the two camera-to-LED vectors. I've decided to place my two cameras so that they both are looking at a common point in space that I decide is the origin of the coordinate system. This way, I can simple measure the location of each camera with a ruler, and know not only the vector positions but also the normal vector that is produced when looking at the middle of each camera image. When a camera locates an LED in its field of view, the normal vector is then determined relative to the one pointing to the origin. The process of finding these normal vectors is a little tedious and requires a few 3D rotation matrices. I'll leave the math and code for that process out of this post for simplicity.

To test out this method, I set up two identical webcams in two identical 3D printed frames pointed at a sphere of LEDs:

Spiral Sphere of LEDs, for another project

Similar to before, I march through each LED turning it off and on to produce background-subtracted images that make locating the LED easy. Instead of mapping the image coordinates directly to a source image, I run the two estimates of LED location through the 3D mapping algorithm described above. The resulting point cloud looks promising:

A few pixels were poorly located by one or both cameras and ended up having strange 3D positions. The most obvious case is the single point floating outside of the sphere. A stronger criteria on whether or not a camera has successfully located an LED might eliminate these outliers. There is also a reasonable amount of distortion on what I expected to be one side of a perfect sphere. I'll be honest on that part, I think I messed up the math in my code that converts an LED location in image coordinates to a normal vector. Before I wrote up this algorithm in C, I prototyped it in an easier language to make sure it worked. Somewhere in translating to C I must have missed a minus sign somewhere, but it seems to be hidden well. If I ever plan on using this method for a more serious project, I'll have to go back and find that bug to fix this distortion.

The reason so few LEDs made it into the final point cloud is that only LEDs visible through both cameras can be located in 3D space. Spacing the cameras out more increases the accuracy of this method, but decreases the number of LEDs on the sphere they can both see. One solution to this problem would be to add more cameras around the sphere. This would increase the number of LEDs seen with two cameras, and introduces the possibility of using three cameras at once to locate a single LED. When using three or more cameras, finding the point in space that minimizes the distance to each camera-to-LED vector line becomes a least squares problem, similar in form to the one I went through on my robot arm project.

The next step in this project on LEDs is to do something visually interesting with this mapping technique. So far I've just been projecting the colorbar test image to verify the mapping, but the real point is to project more complicated images and videos onto a mass of LEDs. In the next few weeks, I will be talking about the process of using this projection mapping to do something interesting with the 3D printed spiral sphere I've shown at the end of this post.