Our objective was to accelerate the Gray-Level Co-Occurrence Matrix (GLCM) calculation procedure. Application profiling showed that the problem was mainly related to the organization of access to data, which often did not get in the cache. The use of various optimization types described in the literature hadn’t given the acceleration due to the fact that the matrices size and gray tone bit depth were significantly lower than those used in the articles. Therefore the overheads were higher than the gain obtained from a more sequential data access.
A thorough study of the algorithm showed that it was possible to unroll some of the loops and extract initialization, and with the alignment of the data it allowed to reach approximately double acceleration. The optimization of next level referred to the algorithm parallelization on the CPU and reduction in the number of appeals to the output array while maintaining the correct results. Additional acceleration was 1.8 times. The analysis of the results showed that the arrays alignment, memory optimization and parallelization for the Intel Core i5 4th generation decreased Wall Time approximately 3.3 times and CPU Time 1.6 times, that was a significant acceleration in fairly demanding applications of the customer.