Getting started with Microsoft Kinect SDK

Getting started with Microsoft Kinect SDK - Depth and Video space

Written by Mike James

Monday, 01 August 2011

Article Index
Getting started with Microsoft Kinect SDK - Depth and Video space
Problems with masks

Page 2 of 2

At this point we need to use player as a mask that will set all pixels that don't correspond to a player to zero. This can be done with a simple logical operation;

pixel at 2x,2y = pixel at 2x,2y & player;

or more succinctly

pixel at 2x,2y &= player;

There is a small problem here in that the pixel in the depth image at x,y corresponds to four pixels in the video image; (2x,2y) (2x+1,2y) (2x,2y+1)(2x+1,2y+1). The reason for this is simple - the video image has twice the resolution of the depth image and each pixel in the depth image corresponds to four in the video image. Also each pixel corresponds to four bytes in the array. If you think about this for a moment it should be obvious that we need to process the eight bytes starting at 2x,2y and 2x,2y+1:

for (int k = 0; k < 8; k++)
{
 VImage.Bits[
  indexOfPixelinBytes(x*2,y*2,
       VImage.Width,VImage.BytesPerPixel)
          + k] &= player;
 VImage.Bits[
  indexOfPixelinBytes(x * 2, y * 2+1, 
       VImage.Width,VImage.BytesPerPixel)
         + k] &= player;
}

The first line ands the player mask with the eight bytes in the row corresponding to 2y and the second processes the eight bytes corresponding to 2y+1. Notice that we use indexOfPixelinBytes again, only now the number of bytes per pixel is four.

Finally, when the for loops are complete, we can display the result in a PictureBox

 pictureBox1.Image=PImageToBitmap(VImage);
}

The PImageToBitmap function has been used in previous articles:

Bitmap PImageToBitmap(PlanarImage PImage)
{
 Bitmap bmap = new Bitmap(
  PImage.Width,
  PImage.Height,
  PixelFormat.Format32bppRgb);
 BitmapData bmapdata = bmap.LockBits(
  new Rectangle(0, 0, PImage.Width,
  PImage.Height),
  ImageLockMode.WriteOnly,
  bmap.PixelFormat);
 IntPtr ptr = bmapdata.Scan0;
 Marshal.Copy(PImage.Bits,
  0,
  ptr,
  PImage.Width *
   PImage.BytesPerPixel *
    PImage.Height);
 bmap.UnlockBits(bmapdata);
 return bmap;
}

If you try this out you will discover that it does work - sort of. The area of the video image that is masked out does correspond to the shape of a player but it is shifted to one side and the shift varies as you move closer or further away. What you are observing is a depth parallax effect identical to the two views you get from two separated cameras as used for 3D imaging.

Clearly we need to make the connection between the pixels in the video and depth image in a more sophisticated way.

Converting from depth to video

To convert from depth to video coordinates is simply a matter in projective geometry. What we have are two perspective views of the same scene and so it is perfect possible to implement a function which converts between them - possible but not easy to get right. For this reason we have to be thankful for the GetColorPixelCoordinatesFromDepthPixel function. Its main problem is that it has a very long name. It also needs not only the depth coordinates but the depth of the pixel in question. This isn't unreasonable as the transformation varies with depth. What is slightly odd if not downright unreasonable is that it needs the unshifted depth i.e. not the depth in millimeters that we have used in previous articles but the depth in millimeters shifted three places to the left.

So our first task is to compute the depth and we also need two variable sot hold the video co-ordinates:

byte player;
short d;
int vx, vy;

In the for loop, just after the player index computation, add:

d = (short)(depthimage.Bits[
     indexOfPixelinBytes(x, y, 
      depthimage.Width, 
      depthimage.BytesPerPixel)+1]<<8|
      (depthimage.Bits[
       indexOfPixelinBytes(x, y, 
        depthimage.Width, 
        depthimage.BytesPerPixel)]&0xF8));

This is just the usual repackaging of the depth data but this time without the three places shift to the right.

Next we need to compute the video co-ordinates vx,vy from the depth co-ordinates x,y and the depth d:

nui.NuiCamera.
 GetColorPixelCoordinatesFromDepthPixel(
  VFrame.Resolution, 
  VFrame.ViewArea, 
  x, y, d, 
  out vx, out vy);

The function also needs the video frame resolution and the view area in case a zoom has been applied.The bad news is that the returned co-ordinates aren't guaranteed to be within the video image. It is perfectly possible for points at the edge of the depth image to map outside the video image because from the video cameras position they cannot be seen. The solution is to simply map such points to the edge of the video image:

vx = Math.Max(0, Math.Min(
               vx, VImage.Width - 2));
vy = Math.Max(0, Math.Min(
               vy, VImage.Height - 2));

Finally we can use vx,vy to do the mask operation.

for (int k = 0; k < 8; k++)
{
 VImage.Bits[
  indexOfPixelinBytes(vx, vy, 
   VImage.Width, 
   VImage.BytesPerPixel) + k] &= player;
 VImage.Bits[
  indexOfPixelinBytes(vx, vy + 1, 
   VImage.Width, 
   VImage.BytesPerPixel) + k] &= player;
}

Now if you run the program you will find that the mask doesn't always fit perfectly but there is no regular shift in its location compared. You will also notice that there are areas that are not masked at all - this is just because their co-ordinates didn't occur in the depth image. If you want a full mask without holes and other artifacts you are going to have to put in a little more work.

The complete event handler is:

void nui_ColorFrameReady(
    object sender, 
    ImageFrameReadyEventArgs e)
{
 if (depthimage.Bits == null) return;
 ImageFrame VFrame = e.ImageFrame;
 PlanarImage VImage = VFrame.Image;
 byte player;
 short d;
 int vx, vy;

 for (int y = 0;
            y < depthimage.Height; y++)
 {
  for (int x = 0; 
             x < depthimage.Width; x++)
  {
   player = (byte)(depthimage.Bits[
     indexOfPixelinBytes(x, y, 
      depthimage.Width, 
      depthimage.BytesPerPixel)] & 0x07);
   if (player != 0) player = 0xFF;
   d = (short)(depthimage.Bits[
     indexOfPixelinBytes(x, y, 
      depthimage.Width, 
      depthimage.BytesPerPixel) + 1]<<8| 
      (depthimage.Bits[
        indexOfPixelinBytes(x, y, 
         depthimage.Width, 
         depthimage.BytesPerPixel)] & 0xF8));

   nui.NuiCamera.
    GetColorPixelCoordinatesFromDepthPixel(
     VFrame.Resolution, 
     VFrame.ViewArea, 
     x, y, d, 
     out vx, out vy);
   vx = Math.Max(0, 
    Math.Min(vx, VImage.Width - 2));
   vy = Math.Max(0, 
    Math.Min(vy, VImage.Height - 2));

   for (int k = 0; k < 8; k++)
   {
    VImage.Bits[
      indexOfPixelinBytes(vx, vy, 
      VImage.Width, 
      VImage.BytesPerPixel) + k] &= player;
    VImage.Bits[
      indexOfPixelinBytes(vx, vy + 1, 
      VImage.Width, 
      VImage.BytesPerPixel) + k] &= player;
   }
  }
 }            
 pictureBox1.Image = PImageToBitmap(VImage);
}

There are obviously lots of improvements that can be made to this code and many variations but this is the basic algorithm for making the connection between depth and video pixels.

You can download the code for the Windows Forms version of this program from the CodeBin (note you have to register first).

Articles in this Series

Next time we take a look into the skeletonization data and discover that Skeletons are not scary....

Converting from depth to video

Articles in this Series

Further reading: