Computer Vision on Embedded Systems: 2011

Tuesday, October 18, 2011

Video to Texture Streaming - i.MX53 processor

The low fps when streaming a video to texture is due to the glTexture2D function, which is highly used for most programmers. This function is relatively slow, since it writes some buffers before actually copying the data to the GPU, which is processed and then displayed.

When I was working on an Augmented Reality demo, it was running on about 15 fps with images with resolution of 320x240, if I wanted to display higher resolution for a better look of the application, it dropped to 7fps, pretty bad.

On a recent research on how to improve the frame rate of my application I found that we can write our data (image) directly to the GPU buffer and displays it without using the glTexture2D function.

The application used for this test (the video can be found at the end of this post) simply get image from the webcam and use it as a texture to a plane. The webcam captures a live youtube video stream being displayed on my desktop monitor and send the data for processing at 30fps (maximum speed at 800x640). This application has 2 threads: one for video capturing and the another one for rendering. While running the render thread, it now reaches 80 fps for a 800x640 images !

Freescale´s OpenGL ES API gives you some extra functions that allows to write directly to the GPU buffer.

Below you can find a piece of code, which do all the magic:

void LoadGLTextures (EGLDisplay egldisplay, IplImage *texture)
{
//Setup eglImage
char* imageBuffer = NULL;
static int start = 0;

EGLint attribs[] = { EGL_WIDTH, TEXTURE_W,
EGL_HEIGHT, TEXTURE_H,
EGL_IMAGE_FORMAT_FSL, EGL_FORMAT_BGRA_8888_FSL, EGL_NONE};

if (! start)
{
g_imgHandle = eglCreateImageKHR(egldisplay, EGL_NO_CONTEXT, EGL_NEW_IMAGE_FSL, NULL, attribs);
glEGLImageTargetTexture2DOES(GL_TEXTURE_2D, g_imgHandle);
start = 1;
}

//printf ("init --> g_imgHandle: 0x%08x\n", (int)g_imgHandle);
eglQueryImageFSL(egldisplay, g_imgHandle, EGL_CLIENTBUFFER_TYPE_FSL, (EGLint *)&imageBuffer);

memcpy (imageBuffer, texture->imageData, texture->imageSize);

return;
}

As you can see it is pretty simple, we create an Image and it is passed to g_imageHandle and then we initialize the texture for this image handle, note that it is only initialized once.

Once got the image and texture initialized, we use the function eglQueryImageFSL which gives us the pointer to the GPU buffer, and then, the data is written to the GPU buffer using memcpy.

And the result is:

note how fast the video is being displayed as a texture on the plane.

EOF !

Tuesday, March 29, 2011

Gesture Recognition Project - step 05 (final) - i.MX53 processor

Hi There !

This post is probably the last step of this project using this approach (single camera), and as an use-case example, a channel, volume and toggle on/off commands for a set-up-box or TV was created.

The gestures description are:

1 - number 1 (index finger) --> channel change option
2 - number 2 (index and middle fingers) --> volume change option
3 - number 0 (closed hand) --> toggle on/off option
4 - number 5 (opened hand) --> cancel command

and for every option a motion sequence is needed to change its value, for example:

1 - channel --> horizontal motion (towards and backwards)
2 - volume --> vertical motion (up and down)
3 - toggle on/off --> circular motion

The result can be viewed below:

the time took between options was needed in order to avoid noise and make sure that the user wants to get into an option.

All the process is running at 15 FPS.

EOF !

Thursday, February 24, 2011

Gesture Recognition Project - step 04 - i.MX53 processor

New achievement ! the ANN was trained to recognizes 8 gestures, but only 7 are showed in this post (the 8th one was censored =D).

Here is the Mean square error evolution during the ANN training:

(MSE value x Training Epochs)

The main problem on this step is the patterns used to train the ANN, they are too similar and sometimes the ANN can get confused or understand an input data as a corrupted data and then it does the approximation erroneously. See the results below:

From this point, some investigations will be done on:

1) improve the hand detection, using glove ?!?! covering the arms ?!?!?
2) use new gestures to apply in an useful use-case
3) detect movements and recognizes motion gestures

EOF!

Wednesday, February 23, 2011

Gesture Recognition Project - step 03 - i.MX53 processor

Alright, one more video regarding one more step on this project.
So, What is new ?

In this step some modifications on the ANN were done, as follows:

1 - changed the output method, more neurons in the output layer
2 - changed the ativation function to tangent hyperbolic
3 - added the MOMENTUM to the Backpropagation algorithm

The ANN was trained now with three different patterns, the result can see below:

Notice that the little delay on the video is due to a lot of images being displayed at the same time and the dirty code wrote during the development.

For the next steps I will clean up the code (if I got time =)).

EOF!

Monday, February 21, 2011

Gesture Recognition Project - step 02 - i.MX53 processor

In this second step follows:

1 - pre-processing (color conversions, mathematical morphology and filters)
2 - skin color detection (segmentation)
3 - convexity hull for fingers detection (not used anymore - besides it is still being calculated on the video)
4 - Artificial Neural Network for Pattern Classification (gesture recognition)

Procedure Used for the recognition module:

Artificial Neural Network (Multilayer Perceptron Architecture) trained with the Backpropagation Algorithm. The input data is a downsized binary image of the detected contours on the pre-processed image. The sample and test data to train the ANN was acquired previously using the same application, saving the pixels values (binary image) in a TXT file.

Some issues appeared during the training, since the values of the pixels were 0 for black and 1 for white. And as this ANN is relatively big (50x50 image + limiar value = 2501 inputs) the neurons were getting saturated on the very first training epoch. With the help of my PhD supervisor, Professor Ivan Nunes da Silva (University of São Paulo - EESC) the values of the pixels were normalized to smaller values, -0.1 for black pixels and 0.1 for white pixels. With this approach the ANN was able to converge and learn the different patterns.

The ANN was trained for this second step with only 2 gestures, opened hand and closed hand. And for a quick test, I let only 30 neurons on the hidden layer and it took only 5 minutes to get the ANN trained.

here is the Result:

The images displayed on the LCD are: HSV segmented image, Grayscale segmented Image, Convexity hull (not used for the recognition), the ANN input data (50x50 binary image) and the result of detection.

UPDATE
------

The saturation of the neurons can be also avoided using lower values for the net weights. Before the training, random values must be generated to the net weights (initial random values), make sure that these values are small.

EOF!