Monday, April 25, 2016

UPDATE - Building OpenCV-2.4.X for Freescale's i.MX6 BSP (Yocto)

Just updating some information of an old post:

you can base your local.conf on the following one:

MACHINE ??= 'imx6qpsabreauto'

DISTRO ?= 'fsl-imx-fb'

PACKAGE_CLASSES ?= "package_rpm"

EXTRA_IMAGE_FEATURES = "debug-tweaks"

USER_CLASSES ?= "buildstats image-mklibs"




    STOPTASKS,${DL_DIR},1G,100K \


    STOPTASKS,/tmp,100M,100K \

    ABORT,${TMPDIR},100M,1K \

    ABORT,${DL_DIR},100M,1K \

    ABORT,${SSTATE_DIR},100M,1K \


PACKAGECONFIG_append_pn-qemu-native = " sdl"

PACKAGECONFIG_append_pn-nativesdk-qemu = " sdl"

ASSUME_PROVIDED += "libsdl-native"


DL_DIR ?= "${BSPDIR}/downloads/"


CORE_IMAGE_EXTRA_INSTALL += "libopencv-core-dev libopencv-imgproc-dev libopencv-objdetect-dev libopencv-ml-dev libopencv-calib3d-dev libopencv-highgui-dev"


that will solve mostly of your issues.


Tuesday, February 23, 2016

OpenCL Embedded Profile (GC2000) - Hello World Application

Based on our last post, let's take a look on a very basic "hello world" application. But before the walkthrough there are some information you need to be aware of. 

OpenCL applications are divided basically in 2 parts: Host and Device codes.

The code for host is responsible for the  hardware initialization, create CL objects (memory, program, kernel, command queues), send signals to device in order to write/read/execute/sync data. 

The code for device is the OpenCL Kernel (C99 based language), which will be accelerated by the GPU. This is a veeeeeeeeery basic description just to keep things as easy as possible, you can find a lot of good  and detailed information on the web to cover it in deep.

Below is the flow for an OpenCL application:

As an example, let's use a very simple CL kernel that only makes a copy from the input  to the output buffer.

Please, note that the pieces of code below was modified from original source (link available down in this post) just to make it more readable , since they are placed in separated functions.

STEP 1 - Defining the Problem Size

OpenCL is meant to solve specific problems, which means that once defined how your kernel will operate, its arguments list and size of your memory objects can't be modified unless releasing all the objects and re-create them with the new desired values.

Defining the problem size means defining our global work-group size and dimension, it can be the length of an array (1D - our use case) or a 2D/3D matrix (we will see it in another sample code on a future post). 

On our Hello World application the Global work-group size is 512, populated with random data (just for testing). Also we need to set our local work-group size (Global work-group data access). Based on the last post, the preferred local work-group size is 16 (per dimension), we will use this value for better performance.

        cl_platform_id  platform_id;
        cl_device_id  device_id;
        cl_context  context;
        cl_command_queue cq;
        cl_program  program;
        cl_kernel  kernel;
        cl_mem helloworld_mem_input = NULL;
        cl_mem helloworld_mem_output = NULL;

        // one dimensional work-items
 int dimension = 1;
 // our problem size
 size_t global = 512;
 // preferred work-group size
 size_t local = 16;
 int size;
 // input data buffer - random values for the helloworld sample
 char *input_data_buffer;
 // output data_buffer for results
 char *output_data_buffer;

 cl_int ret;
        // make our size equals our global work-group size
 size = global;
 input_data_buffer = (char *) malloc (sizeof (char) * size);
 if (! input_data_buffer)
  printf ("\nFailed to allocate input data buffer memory\n");
  return 0;

 output_data_buffer = (char *) malloc (sizeof (char) * size);
 if (! output_data_buffer)
  printf ("\nFailed to allocate output data buffer memory\n");
  return 0;
 // populate data_buffer with random values 
 for (int i = 0; i < size; i++)
  input_data_buffer[i] = rand () % 255;

STEP 2 - Hardware Initialization

This is the basic step to an OpenCL application. Hardware initialization consists in:
  • Listing available platforms (one GC2000)
  • Getting compute device information (Vivante OCL EP device)
  • Create the CL Context
  • Create the Command Queue (Control information from Host to Device)

       cl_uint  platforms, devices;
 cl_int error;

 // cl_int clGetPlatformIDs (cl_uint num_entries, cl_platform_id *platforms, cl_uint *num_platforms)
 error = clGetPlatformIDs (1, &platform_id, &platforms);
 if (error != CL_SUCCESS) 
  return CL_ERROR;

 // cl_int clGetDeviceIDs (cl_platform_id platform, cl_device_type device_type, cl_uint num_entries, 
 //   cl_device_id *devices, cl_uint *num_devices)
 error = clGetDeviceIDs (platform_id, CL_DEVICE_TYPE_GPU, 1, &device_id, &devices);
 if (error != CL_SUCCESS) 
  return CL_ERROR;
 // cl_context clCreateContext (cl_context_properties *properties, cl_uint num_devices, 
 //    const cl_device_id *devices, void *pfn_notify (const char *errinfo, 
 //    const void *private_info, size_t cb, void *user_data),  
 //    void *user_data, cl_int *errcode_ret)
 cl_context_properties properties[] = {CL_CONTEXT_PLATFORM, (cl_context_properties)platform_id, 0};
 context = clCreateContext (properties, 1, &device_id, NULL, NULL, &error);
 if (error != CL_SUCCESS) 
  return CL_ERROR;
 // cl_command_queue clCreateCommandQueue (cl_context context, cl_device_id device, 
 //     cl_command_queue_properties properties, cl_int *errcode_ret)
 cq = clCreateCommandQueue (context, device_id, 0, &error);
 if (error != CL_SUCCESS) 
  return CL_ERROR;

STEP 3 - Create the OCL Objects (Program, Kernel and Memory)

This step is pretty straight forward, since you already defined your problem size just need to set them on the OpenCL Objects:

 cl_int error = CL_SUCCESS;

 // cl_program clCreateProgramWithSource (cl_context context, cl_uint count, const char **strings, 
 //          const size_t *lengths, cl_int *errcode_ret)
 program = clCreateProgramWithSource (context, 1, (const char **)kernel_src, &kernel_size, &error);
 if (error != CL_SUCCESS)
  return CL_ERROR;
 // cl_int clBuildProgram (cl_program program, cl_uint num_devices, const cl_device_id *device_list,
 //   const char *options, void (*pfn_notify)(cl_program, void *user_data), void *user_data)
 error = clBuildProgram (program, 1, &device_id, "", NULL, NULL);
 if (error < 0)
  // cl_int clGetProgramBuildInfo ( cl_program  program, cl_device_id  device, cl_program_build_info  
  //   param_name, size_t  param_value_size, void  *param_value, size_t  *param_value_size_ret)
  clGetProgramBuildInfo(program, device_id, CL_PROGRAM_BUILD_LOG, kernel_size, kernel_src, NULL);
  printf ("\n%s", kernel_src);
 // cl_kernel clCreateKernel (cl_program  program, const char *kernel_name, cl_int *errcode_ret)
        // "hello_world" is the name of our kernel function inside the external file.
        kernel = clCreateKernel (program, "hello_world", &error );
 if (error != CL_SUCCESS)
  return CL_ERROR;

 // cl_mem clCreateBuffer (cl_context context, cl_mem_flags flags, size_t size, 
 //    void *host_ptr, cl_int *errcode_ret)
 helloworld_mem_input = clCreateBuffer (context, CL_MEM_READ_ONLY, size, NULL, &error); 
 if (error!= CL_SUCCESS) 
                 return CL_ERROR;
 helloworld_mem_output = clCreateBuffer (context, CL_MEM_WRITE_ONLY, size, NULL, &error); 
 if (error!= CL_SUCCESS) 
   return CL_ERROR

STEP 4 - Set Kernel Arguments

The Kernel arguments must be set in the host code and the sequence of the arguments has to be the same, in our case the inpute and output buffer respectively.

 // cl_int clSetKernelArg (cl_kernel kernel, cl_uint arg_index, size_t arg_size, 
 //       const void *arg_value)
 clSetKernelArg (kernel, 0, sizeof(cl_mem), &helloworld_mem_input);
 clSetKernelArg (kernel, 1, sizeof(cl_mem), &helloworld_mem_output);

STEP 5 - Execute the Kernel (Command Queue)

This is the part where all the magic happens. We write the data from host to device , send the start signal to the Device to execute the kernel and finally reads the data back from device to the host. Here is how it is done:

  • clEnqueueWriteBuffer: Write data on the device
  • clEnqueueNDRangeKernel: start the Kernel execution
  • clEnqueueReadBuffer: Reads data back from device

 // cl_int clEnqueueWriteBuffer (cl_command_queue command_queue, cl_mem buffer, 
 //        cl_bool blocking_write, size_t offset, size_t cb, 
 //        const void *ptr, cl_uint num_events_in_wait_list, 
 //        const cl_event *event_wait_list, cl_event *event)
 error = clEnqueueWriteBuffer(cq, helloworld_mem_input, CL_TRUE, 0, size, input_data_buffer, 0, NULL, NULL);
 if (error != CL_SUCCESS) 
  return CL_ERROR
 // cl_int clEnqueueNDRangeKernel (cl_command_queue command_queue, cl_kernel kernel, 
 //        cl_uint work_dim, const size_t *global_work_offset, 
 //        const size_t *global_work_size, const size_t *local_work_size, 
 //        cl_uint num_events_in_wait_list, const cl_event *event_wait_list, 
 //        cl_event *event)
 error = clEnqueueNDRangeKernel (cq, kernel, dimension, NULL, &global, &local, 0, NULL, NULL);
 if  (ret == CL_SUCCESS)
  // cl_int clEnqueueReadBuffer (cl_command_queue command_queue, cl_mem buffer, 
  //        cl_bool blocking_read, size_t offset, size_t cb, 
  //        void *ptr, cl_uint num_events_in_wait_list,
  //        const cl_event *event_wait_list, cl_event *event)
  error = clEnqueueReadBuffer(cq, helloworld_mem_output, CL_TRUE, 0, size, output_data_buffer, 0, NULL, NULL);
  return CL_ERROR

STEP 5 - Clean Up the OpenCL Objects

To prevent memory leak or any other issue, the CL objects must be cleaned:

 clFlush( cq);

 clReleaseKernel (kernel);
 clReleaseMemObject (helloworld_mem_input);
 clReleaseMemObject (helloworld_mem_output);

Final Considerations

The sample application presented on this post meant to demonstrate how to create a basic OpenCL based application and it can gets complicated as much as you need it.

The complete and functional source code can be found here !

For information related to the OpenCL EP API, access the Khronos website here.

I hope the information shared in this post can help and give you some directions in your future projects.


Wednesday, September 30, 2015

Introduction to i.MX6Q/D (GC2000) Vivante OpenCL Embedded Profile

The purpose of this post is not make you a new OpenCL expert, but provide you the basic knowledge to take advantage of the i.MX6’s GPGPU support and get your code (or part of it) accelerated by its Graphics Processing Unit.

First of All, what is GPGPU and OpenCL ?


       Stands for General Purpose Graphics Processing Unit

       Algorithms well-suited to GPGPU implementation are those that exhibit two properties: they are data parallel and throughput  intensive

       Data parallel: means that a processor can execute the operation on different data elements simultaneously.

       Throughput intensive: means that the algorithm is going to process lots of data elements, so there will be plenty to operate on in parallel.

       Pixel-based applications such as computer vision and video and image processing are very well suited to GPGPU  technology, and for this reason, many of the commercial software packages in these areas now include GPGPU acceleration


       Open Computing Language (OpenCL) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors.

       OpenCL includes a language (based on C99) for writing kernels (functions that execute on OpenCL devices), plus application programming interfaces (APIs) that are used to define and then control the platforms

       OpenCL provides parallel computing using task-based and data-based parallelism.

       OpenCL is an open standard maintained by the non-profit technology consortium Khronos Group.

       Apple, Intel, Qualcomm, Advanced Micro Devices (AMD), Nvidia, Altera, Samsung, Vivante and ARM Holdings have adopted it.

There are A LOT of OpenCL tutorials on the web explaining all its concepts and capabilities. Below you will find only the most important ones:

Introduction to OpenCL

       In order to visualize the heterogeneous architecture in terms of the API and restrict memory usage for parallel execution, OpenCL defines multiple cascading layers of virtual hardware definitions

       The basic execution engine that runs the kernels is called a Processing Element (PE)

       A group of Processing Elements is called a Compute Unit (CU)

       Finally, a group of Compute Unit is called Compute Device.

       A host system could interact with multiple Compute Devices on a system (e.g., a GPGPU and a DSP), but data sharing and synchronization is coarsely defined at this level. 

       Each item a kernel works on is called a 'work item'.

       A simple example of this is determining the color of a single pixel (work-item) in an output image.

       Work-items are grouped into 'work-groups' , which are each executed in parallel to speed up calculation performance

       How big a work-group is depends on the algorithm being executed and the dimensions of the data being processed (e.g. one work-item per pixel for a block of pixels in a filter) 

OpenCL runs in a 'data parallel’ programming model where the kernels run once for each item in an 'index space‘. The dimensionality of the data being processed (e.g., 1, 2, or 3 dimension arrays; called NDRange or N-dimensional range).

Freescale’s i.MX6Q/D GPU (GC2000) OpenCL EP features

       Vivante GC2000 GPGPU capable of running OpenCL 1.1 Embedded Profile (EP)

       OpenCL embedded profile capabilities (that means for instance no atomic variables, does not mandate support for 3D Images,  64 bit integers or double precision floating point numbers)

       4xSIMD cores (vec-4) shader units

       Up to 512 general purpose registers 128b each for each of the cores

        Maximum number of instructions for kernels is 512

       1-cycle throughput for all shader instructions

        L1 cache of 4KB

        Uniform registers 168 for vertex shader and 64 for fragment shader

        Single integer pipeline/core

       In OpenCL Embedded Profile, the requirements requirements for samplers are reduced, with the number of samplers decreased from 16 (FP – Full Profile) to 8 (EP), and the math precision (ULP) is slightly relaxed below the IEEE-754 specification for some functions

       Lastly, in OpenCL EP the minimum image size is reduced to 2048 (from 8192 in FP) and the local memory requirement is reduced to 1KB (from 32KB in FP)

Each of the shader cores function as a CU. The cores are a native Vec4 ISA, thus the preferred vector width for all primitives 4.

Code Optimization for Freescale’s i.MX6Q/D OpenCL EP

       Vector math inputs in multiples of 4.

     As mentioned previously, the GC2000 in i.MX 6Q is a vec4 floating point SIMD engine, so vector math always prefers 4 inputs (or a multiple of 4) for maximum math throughput.

       Use full 32 bit native registers for math.

     Both integer and floating point math is natively 32 bit. 8 and 16bit primitives will still use 32 bit registers, so there is no gain (for the math computation) in going with lower sizes.

       Use floating point instead of integer formats

     1x 32-bit Integer pipeline (supports 32-bit INT formats in hardware, 8-bit/16-bit done in software)
     4x 32-bit Floating Point pipeline (supports 16-bit and 32-bit FP formats in hardware)

       To maximize OpenCL compute efficiency, it is better to convert integer formats to floating point to utilize the four (4) parallel FP math units.

       Use 16-bit integer division and 32-bit for other integer math operations

     For integer math (excluding division), there is one 32-bit integer adder and one 32-bit integer multiplier per core. If integer formats are required, use 32 bit integer formats for addition, multiplication, mask, and sin extensions.
      Try to minimize or not use 8-bit or 16-bit integer formats since they will be calculated in software and the 32-bit INT ALU will not be used.
     Integer division: iMX 6Q hardware supports only 16-bit integer division, and software is used to implement 8-bit and 32-bit division.
     It is better to use 16-bit division if possible. There will be a performance penalty if 32-bit division is used.

       Use Round to Zero mode

     Floating point computation supports “round-to-zero” only (round-to-nearest-even is not required for EP, if round-to-zero is supported).

       Data accesses should be 16B

     For the most efficient use of the GPGPU’s L1 cache.
     Work-group size should be a multiple of thread-group size.
     Work-group size should be an integer multiple of the GPU's internal preferred work-group size (16 for GC2000) for optimum hardware usage.

       Keep Global work size at 64K (maximum) per dimension

     Since global IDs for work-items are 16 bits, it is necessary to keep the global work size within 64K (65,536 or 216) per dimension.

       Conditional Branching should be avoided if possible

      Branch penalties depend on the percentage of work-items that go down the branch paths.

This post is long enough for just an “introductory” information about i.MX6Q/D OpenCL EP, for more information including a sample application, take a look on this good white paper provided by Freescale:


Saturday, August 8, 2015

New posts coming soon !

Hello Everybody !!!

Sorry for not posting anything new since last year. But unfortunately I have been running out of time to create applications that I could share the full source code.

And even that, with old posts this blog still holds 2K views per month. THANK YOU !

I am preparing a set of posts (at least 2 per month) that will also envolve GPU acceleration on our Computer Vision topics !! The use of OpenCV is the main purpose of this blog but not all posts will base on that.

Stay  Tuned !


Wednesday, August 13, 2014

Onboard Camera - V4L wrapper with Yocto Project

Based on my last post, there are a lot of people having problems to use this wrapper in Yocto, since it uses a kernel header mxcfb.h and it differs a bit from LTIB, so this post is to help solving this issue.

We have a nice tool on linux that most people don't use it (myself included) and can do the job of dealing with dependencies like a charm ! This tool is called AUTOTOOLS, yes, that one which create the pratical and easy ./configure ./make and ./make install system. Below I will demonstrate how we can use that to solve our issue:


We now need to add the kernel-dev to our local.conf (in build/conf) file and it will look like:

     BB_NUMBER_THREADS ?= "${@oe.utils.cpu_count()}"
     PARALLEL_MAKE ?= "-j ${@oe.utils.cpu_count()}"
     MACHINE ??= 'imx6qsabresd'
     DISTRO ?= 'poky'
     PACKAGE_CLASSES ?= "package_rpm"
     EXTRA_IMAGE_FEATURES = "debug-tweaks"
     USER_CLASSES ?= "buildstats image-mklibs image-prelink"
     PATCHRESOLVE = "noop"
         STOPTASKS,${TMPDIR},1G,100K \
         STOPTASKS,${DL_DIR},1G,100K \
         STOPTASKS,${SSTATE_DIR},1G,100K \
         ABORT,${TMPDIR},100M,1K \
         ABORT,${DL_DIR},100M,1K \
     PACKAGECONFIG_pn-qemu-native = "sdl"
     ASSUME_PROVIDED += "libsdl-native"
     CONF_VERSION = "1"

     PARALLEL_MAKE = '-j 4'

     DL_DIR ?= "${BSPDIR}/downloads/"

     DISTRO_FEATURES_remove = "x11 wayland"

     CORE_IMAGE_EXTRA_INSTALL += "gpu-viv-bin-mx6q gpu-viv-bin-mx6q-dev kernel-dev"


     $ bitbake core-image-base


     if the mxcfb.h file doens't appear in your sysroot (/opt/poky/1.6.1/sysroots/cortexa9hf-vfp-neon-poky-linux-gnueabi/) then you need to build your toolchain and then reinstall it as well:

     rebuild with -c populate:

          $ cd fsl-community-bsp/build
          $ bitbake core-image-base -c populate_sdk

     and then install:

          $ cd /fsl-community-bsp/build/tmp/deploy/sdk
          $ ./


     Now is the interesting part, we will need 2 different files (pretty simple ones actually), they are: and

     Based on my projects configuration, I like to have bin, include and src folders, so the sample below will be on those:

     Project Tree:
             Project folder
a)                     -----
b)                     -----
                        ----- bin /
                        ----- src /
                        |         |
c)                     |         ------
                        |         |
                        |         ------ camera_test.c
                        |         |
                        |         ------ v4l_wrapper.c
                        ----- include /
                                  ------ v4l_wrapper.h


             AUTOMAKE_OPTIONS = foreign
        SUBDIRS = src          # add just those ones that will have a Makefile inside.

             AC_INIT([camera_test], [0.1], [])
             AM_INIT_AUTOMAKE([-Wall -Werror foreign])
             AC_CHECK_HEADERS([fcntl.h stdint.h stdlib.h string.h sys/ioctl.h unistd.h])

             # Checks for library functions.
             AC_CHECK_FUNCS([gettimeofday memset munmap])

             # Checks for programs needed for tests




             bin_PROGRAMS = camera_test

             camera_test_SOURCES = camera_test.c ../include/v4l_wrapper.h
             nodist_camera_test_SOURCES = ../src/v4l_wrapper.c


             TARGET_PATH_LIB = $(ROOTFS_DIR)/usr/lib
             TARGET_PATH_INCLUDE = $(ROOTFS_DIR)/usr/include

             AM_CPPFLAGS = -I $(prefix)/usr/src/kernel/include/uapi -I $(prefix)/usr/src/kernel/include/ -I $(TARGET_PATH_INCLUDE) -I ../include

             AM_LDFLAGS = -Wl,--library-path=$(TARGET_PATH_LIB),-rpath-link=$(TARGET_PATH_LIB) -lm  -L $(prefix)/usr/lib 
             # workaround to get the binaries copied to local /bin folder
                       mv ../src/$(bin_PROGRAMS) ../bin

                        rm -rf ../bin/*
                  rm -rf ../src/*.o

     * if you are not going to use any additional source you can only let the camera_test_SOURCES = camera_test.c


     In a clean terminal enter the following commands to generate the files:

          $ autoheader
          $ autoreconf --install

     it will create all the necessary files to build your application, including dealing with dependencies, even if they are from kernel.


     For building any application using yocto (unless you have a recipe for it), you have to export the toolchain environment variables, and once it is done, the terminal you have used is now dirty and if you want to do all the steps above again, you will need another one (cleaned).

          $ cd /opt/poky/1.6.1/
          $ .  ./environment-setup-cortexa9hf-vfp-neon-poky-linux-gnueabi
          $ cd /home/usr/v4l_wrapper_yocto/
          $ ./configure --host=arm-poky-linux-gnueabi --prefix=$SDKTARGETSYSROOT
          $ make

     At this point you will have your binary placed in bin/ folder, if you enter the make install command it will install (copy) the application binary into your rootfs/bin, where the rootfs is defined by the --ṕrefix in ./configure line.


The sample code can be found here.

Thanks to Rogério Pimentel who helped me with the autotools and how to solve the mxcfb.h dependency issue and Daiane Angolini with YOCTO.


Monday, May 19, 2014

Onboard Camera - V4L wrapper for use with OpenCV

Many people have been asking how to use the onbard camera (CSI and MIPI that comes with the Freesacale's development board) with OpenCV.

Unfortunately the mxc driver for this camera is not compatible with OpenCV and there is no way to use it directly, instead, we can use wrapper functions to access it. In this post I will share with you guys some utility functions I created (based on an Ipu demo app from Rogerio Pimentel - Freescaler) that uses V4L to access the camera.

When using these functions we will get frames on UYVY format (YUV422) and to use with OpenCV it must be converted to RGB24 (24bpp), all these are included in the source code.

The code can be downloaded from here, it also comes with a sample application, you just need to export the ROOTFS envioroment variable and check if the toolchain you are going to use is the same as defined in the Makefile, otherwise you must change it too.

The demo application (tested on i.MX6 sabresd board) performs the Canny edge detector and displays it using cvShowImage, you could display it directly in the framebuffer using the utility function, just uncomment the functions in the code and also add the conversion from RGB888toYUV422.

to run this application, make sure to have these drivers installed:

modprobe ov5642_camera
modprobe ov5640_camera_mipi
modprobe mxc_v4l2_capture

here is the result:

ps:  There is a "small" issue in these wrapper functions, as it uses multiple buffers for capturing and output (if you use it instead cvShowImage), you need to at least use the  V4LWrapper_QueryFrame function 3 times in a row to fill all buffers and the display it, so, the best scenario would be using this function like cvQueryFrame, when I got a fix for this I will update this post. Meanwhile just throw this function inside a loop function and you are good to go =)


Wednesday, February 26, 2014

Building OpenCV-2.4.X for Freescale's i.MX6 BSP (Yocto)

Lately a lot of people are working with the Yocto Project and many of them migrated from LTIB (like me). Yocto uses a different conception when adding new packages/applications to the system, now everything is based on RECIPES. As it is being highly used, the amount of packages (recipes) already included is very big and it keep increasing continuously. For our lucky the recipe for OpenCV is already there, we just need to configure the system in order to add it to us.

In order to get everything up running we will divide de taks in steps:

Step #1 - Installing Yocto

As our focus is to install OpenCV, the Yocto install procedure we can use this very good tutorial created by Daiane:

Step #2 - Enabling OpenCV

As we already have the OpenCV recipe in our Yocto release, we just need to add what packages we want in our local.conf file, located at /yocto/fsl-community-bsp/build/conf. With some modification (opencv package), it should look like this:

    MACHINE ??= 'imx6qsabresd'
    DISTRO ?= 'poky'
    PACKAGE_CLASSES ?= "package_rpm"
    EXTRA_IMAGE_FEATURES = "debug-tweaks dev-pkgs"
    USER_CLASSES ?= "buildstats image-mklibs image-prelink"
    PATCHRESOLVE = "noop"
    STOPTASKS,${DL_DIR},1G,100K \
    ABORT,${TMPDIR},100M,1K \
    ABORT,${DL_DIR},100M,1K \
    PACKAGECONFIG_pn-qemu-native = "sdl"
    ASSUME_PROVIDED += "libsdl-native"
    CONF_VERSION = "1"

    PARALLEL_MAKE = '-j 4'

    DL_DIR ?= "${BSPDIR}/downloads/"

    CORE_IMAGE_EXTRA_INSTALL += "gpu-viv-bin-mx6q gpu-viv-bin-mx6q-dev"
    CORE_IMAGE_EXTRA_INSTALL += "libopencv-core-dev libopencv-highgui-dev
libopencv-imgproc-dev libopencv-objdetect-dev libopencv-ml-dev"

    LICENSE_FLAGS_WHITELIST = "commercial"

Note that we included the "-dev" packages, this is necessary if you always want to have the OpenCV headers/libraries included in the rootfs, Yocto is smart if you don´t add a "-dev" package and the libraries are just included any application that uses it needs to be built. As we always want our OpenCV stuff to build our applications so we use it this way.

Step #3 - Building OpenCV

Now the easy part:

/yocto/fsl-community-bsp/build$./bitbake core-image-x11

after build is finished you can check the images generated by the bitbake command at:


and after extracting the rootfs: core-image-x11-imx6qsabresd.tar.bz2, you can find the opencv libraries in the /usr/lib folder:

andre@b22958:~/bsps/yocto/rootfs$ ls usr/lib/libopen*
usr/lib/           usr/lib/
usr/lib/       usr/lib/
usr/lib/     usr/lib/
usr/lib/           usr/lib/
usr/lib/       usr/lib/
usr/lib/     usr/lib/
usr/lib/              usr/lib/
usr/lib/          usr/lib/
usr/lib/        usr/lib/
usr/lib/        usr/lib/
usr/lib/    usr/lib/
usr/lib/  usr/lib/
usr/lib/             usr/lib/
usr/lib/         usr/lib/
usr/lib/       usr/lib/
usr/lib/               usr/lib/
usr/lib/           usr/lib/
usr/lib/         usr/lib/
usr/lib/           usr/lib/
usr/lib/       usr/lib/
usr/lib/     usr/lib/
usr/lib/           usr/lib/
usr/lib/       usr/lib/
usr/lib/     usr/lib/
usr/lib/            usr/lib/
usr/lib/        usr/lib/
usr/lib/      usr/lib/

ps: don´t forget to flash the card with the image created at /tmp/deploy/images/imx6qsabresd/core-image-x11-imx6qsabresd.sdcard

$ sudo dd if= /build/tmp/deploy/images/imx6qsabresd/core-image-x11-imx6qsabresd.sdcard of=/dev/sdb

After those 3 steps above you should be able to find all the OpenCV headers/libraries needed by mostly of your application, but in any case you need more dev packages, you can look at: /tmp/work/cortexa9hf-vfp-neon-poky-linux-gnueabi/opencv/2.4.6+gitAUTOINC+1253c2101b-r0/packages-split

Now that you have the OpenCV headers/libraries we need the toolchain to build our sample application, just re-do the bitbake command now adding the "-c populate" option in the command line:

/yocto/fsl-community-bsp/build$./bitbake core-image-x11 -c populate_sdk

and then run the install script created at: /yocto/fsl-community-bsp/build/tmp/deploy/sdk to install it.

With that you will be able to see the toolchain installed at: /opt/poky

Now we are able to test our sample code, just a camera test and you can find the source code here: camera_test_sample

To build this application you need a new terminal window (all environment variables will be reset), then run the setup environment:

$ cd /opt/poky/1.5+snapshot/
$ . ./environment-setup-cortexa9hf-vfp-neon-poky-linux-gnueabi

and then go to the camera_test_yocto folder and type make. The binary will be placed in the bin folder.

Once flashed your card with the Yocto image (opencv included), mount the sd card in your host computer and then copy the binary to your rootfs.

To test it, run the application with the following command:

$ DISPLAY =:0 ./camera_test


Tuesday, April 2, 2013

Building OpenCV-2.4.X for Freescale's i.MX6 BSP (LTIB)

I was working on this post for a long time, and I didn't share any information before because I Was facing a big problem to get the highgui library successfully built to our embedded system. And as you may know, the highgui is very important when you need to open a video stream, create a camera device and even work with windows in OpenCV.

Last week I managed to build it successfully using the latest Freescale's BSP (kernel 3.0.35) and I'm going to show you how to do that in the following lines.

First of all,  you can download the Freescale's i.MX6 BSP here !

Unfortunately getting the latest BSP doesn't mean you are going to get all the latest packages. The only things that are really update in this BSP and probably all others are the kernel and specific drivers for the current hardware you are going to work with. The most commons packages like GTK, GLIB2, PANGO, CAIRO and etc... are not updated, and in order to build the new OpenCV 2.4.X we will need to upgrade these packages and install a new one, the v4l-utils.

So let's stop with this BLAH BLAH BLAH and get into what is really interesting  !

Step 1 - Building the dependencies
 1) Assuming you already have the BSP installed (min profile) in your host machine, you will need to upgrade the GLIB2 package and install the v2l-utils one.

2) Get the new spec files at:

3) the new GLIB2 now has a dependency, the LIBFFI. (you can also get the spec file of this lib in the link above.

4) Build & install the libffi as follows:
     mkdir ../ltib/dist/lfs-5.1/libffi
     cp ../git/ltib-pkgs-upgrade/libffi.spec ../ltib/dist/lfs-5.1/libffi
     cp ../git/ltib-pkgs-upgrade/libffi-3.0.12-includedir-1.patch /opt/freescale/pkgs
     cp ../downloads/libffi.tar.gz /opt/freescale/pkgs
* you can find the link that you can get the libffi from.
      cd ../ltib
     ./ltib -p libffi.spec -m prep
     ./ltib -p libffi.spec -m scbuild
     ./ltib -p libffi.spec -m scdeploy

5) Build & install the new glib2
    cd ../ltib
    cp ../git/ltib-pkgs-upgrade/glib2.spec ../ltib/dist/lfs-5.1/glib2
    cp ../downloads/glib-2.35.9.tar.xz /opt/freescale/pkgs
    ./ltib -p glib2.spec -m prep
    ./ltib -p glib2.spec -m scbuild
    ./ltib -p glib2.spec -m scdeploy

6) Build & install the v4l-utils packages
    mkdir ../ltib/dist/lfs-5.1/v4l-utils
    cp ../git/ltib-pkgs-upgrade/v4l-utils.spec ../ltib/dist/lfs-5.1/v4l-utils
    cp ../downloads/v4l-utils-0.9.3.tar.bz2 /opt/freescale/pkgs
    cd ../ltib

    ./ltib -p v4l-utils.spec -m prep
    ./ltib -p v4l-utils.spec -m scbuild
    ./ltib -p v4l-utils.spec -m scdeploy

After all these steps you should have the necessary dependencies built for now building the OpenCV 2.4.X.

Step 2 - Cross-Compiling OpenCV-2.4.X

When Cross-compiling using cmake you need a special file containing the information about the toolchain you are going to use. In our case we can create this file as:

1) touch toolchain.cmake & vi toolchain.cmake

2) add the following information:

               # this one is important
               set( CMAKE_SYSTEM_NAME Linux )

               #this one not so much
               set( CMAKE_SYSTEM_PROCESSOR arm )

               # specify the cross compiler
                set( CMAKE_C_COMPILER /opt/freescale/usr/local/gcc-4.6.2-glibc-    2.13-linaro-multilib-2011.12/fsl-linaro-toolchain/bin/arm-none-linux-gnueabi-gcc )
               set( CMAKE_CXX_COMPILER /opt/freescale/usr/local/gcc-4.6.2-glibc-2.13-linaro-multilib-2011.12/fsl-linaro-toolchain/bin/arm-none-linux-gnueabi-g++ )

              # where is the target environment - point to your rootfs here
              set( CMAKE_FIND_ROOT_PATH  /home/andre/bsps/imx6x/1301/ltib/rootfs )

              # search for programs in the build host directories

             # for libraries and headers in the target directories

             # point to your rootfs path here 
             set( CMAKE_CXX_FLAGS "-L/home/andre/bsps/imx6x/1301/ltib/rootfs/usr/lib" )

3) download OpenCV-2.4.X here or here.

4) untar the package:
    tar -xvf OpenCV-2.4.2.tar.bz2

5) cd OpenCV-2.4.2
    mkdir build
    cd build
    cp ../../toolchain.cmake .
    cmake -DCMAKE_TOOLCHAIN_FILE=toolchain.cmake ../

after these steps you should have the necessary files to be built in your build directory, you now just need to configure which extra packages you want to add or remove for your OpenCV profile.

In your build folder:

6) ccmake .

7) you can let your configuration similar to mine:

BUILD_DOCS                          OFF
 BUILD_EXAMPLES                OFF
BUILD_JASPER                       OFF
BUILD_JPEG                           OFF
 BUILD_PACKAGE                  ON
BUILD_PERF_TESTS              ON
 BUILD_PNG                           OFF
 BUILD_TESTS                        ON
 BUILD_TIFF                           OFF
 BUILD_ZLIB                          OFF
 BUILD_opencv_calib3d        ON
 BUILD_opencv_contrib        ON
 BUILD_opencv_core             ON
 BUILD_opencv_features2d  ON
 BUILD_opencv_flann            ON
 BUILD_opencv_gpu              ON
 BUILD_opencv_highgui        ON
 BUILD_opencv_imgproc       ON
 BUILD_opencv_legacy          ON
 BUILD_opencv_ml                 ON
 BUILD_opencv_nonfree        ON
 BUILD_opencv_objdetect     ON
 BUILD_opencv_photo           ON
 BUILD_opencv_stitching      ON
 BUILD_opencv_ts                  ON
 BUILD_opencv_video            ON
 BUILD_opencv_videostab     ON
 CMAKE_INSTALL_PREFIX             /home/andre/imx_applications/OpenCV-2.4.2/build/install
 CMAKE_TOOLCHAIN_FILE             /home/andre/imx_applications/OpenCV-2.4.2/toolchain.cmake
 CMAKE_VERBOSE                   OFF
 CUDA_BUILD_CUBIN               OFF
 ENABLE_PROFILING                           OFF
 EXECUTABLE_OUTPUT_PATH           /home/andre/imx_applications/OpenCV-2.4.2/build/bin
 INSTALL_C_EXAMPLES                      OFF
 INSTALL_TO_MANGLED_PATHS        OFF                 
 LIBRARY_OUTPUT_PATH_ROOT         /home/andre/imx_applications/OpenCV-2.4.2/build                                                                                              
 OPENCV_CONFIG_FILE_INCLUDE_DIR   /home/andre/imx_applications/OpenCV-2.4.2/build                                                                                              
 WITH_CUBLAS                   OFF
 WITH_CUDA                       OFF
 WITH_CUFFT                     ON
WITH_EIGEN                       ON
WITH_FFMPEG                   ON
 WITH_GTK                          ON
 WITH_JASPER                    OFF
 WITH_JPEG                        ON
 WITH_OPENEXR                ON
 WITH_OPENGL                  OFF
 WITH_OPENNI                   OFF
 WITH_PNG                         ON
 WITH_PVAPI                       ON
 WITH_QT                           OFF
 WITH_TBB                         OFF
 WITH_TIFF                        ON
 WITH_UNICAP                   OFF
 WITH_V4L                         ON
 WITH_XIMEA                     OFF
 WITH_XINE                        OFF
you basically removed the gpu acceleration (we don't support full profile OpenCL), we enabled the v4l library and also disabled some unnecessary ones, you can try your on configuration if you will.

8) make -j

9) make install

If you reached the step 9 with no issues you should have all  your cross built OpenCV in the install folder. Just copy it to your rootfs:

10) sudo cp -a build/install ../ltib/rootfs/usr

and you know have a OpenCV-2.4.X installation in your embedded system.

Now the result using a simple code to open a camera device and display the image:



I Got some issues building GTK when following the procedure above, so here is the deal to avoid any headache:
  1. Install X11        (package list)
  2. Install GTK+    (package list)
  3. add the LIBFFI and V4L-UTILS like described earlier.
  4. Upgrade the GLIB2 package like described earlier.
  5. Cross Compile the OpenCV Library.


EOF² !