Wednesday, January 30, 2008

We should try to reduce initialization task ourselves

(Jan 30, 2008)

I tried implement an efficient way to build an isotropic image. For a long time, I thought a compiler optimization will handle initialization overhead automatically, but for Visual C++ 2005, it does not.

Consider the following code:

for( int z = 0; z < y =" 0;" x =" 0;"> arVoxels[8];
NVImage_t arValues[8]; // keep H.U. values of relevant voxels


In fact, we can move all declarations outside the most outer loop (z) as follows:

float fX, fY, fZ; // for true position
float fa, fb, fc; // relative voxel distance as in 2.1.
NVVoxel arVoxels[8];
NVImage_t arValues[8]; // keep H.U. values of relevant voxels

for( int z = 0; z < y =" 0;" x =" 0;" z =" 0;" y =" 0;" x =" 0;"> arVoxels[8];
NVImage_t arValues[8]; // keep H.U. values of relevant voxels

We can see that for each z, there will be two threads. Hence, if we move declarations to the most outer loop, two threads will use the same variable storage, and that is not correct. What we should do is moving these declarations just one step outer as shown below:

#pragma omp parallel
for( int z = 0; z < y =" 0;"> arVoxels[8];
NVImage_t arValues[8]; // keep H.U. values of relevant voxels

for( int x = 0; x < nNewSizeX-2; x++ ) {

From my experiment, this makes things much faster since array initialization is expensive. Note: we can even reduce initialization further if we divide the loop ourselves and create 'multiple sections'. For each section, declare its own variables. I, however, will not do this since the code will be clutter, and we have to know the number of sections before hand, which is normally equal to the number of physical cores in a system. Thus, changing CPU may cause some performance issue. Using OpenMP in this way, nonetheless, is very flexible regarding to system configurations.

No comments: