I tried implement an efficient way to build an isotropic image. For a long time, I thought a compiler optimization will handle initialization overhead automatically, but for Visual C++ 2005, it does not.
Consider the following code:
for( int z = 0; z < y =" 0;" x =" 0;"> arVoxels[8];
NVImage_t arValues[8]; // keep H.U. values of relevant voxels
In fact, we can move all declarations outside the most outer loop (z) as follows:
float fX, fY, fZ; // for true position
float fa, fb, fc; // relative voxel distance as in 2.1.
NVVoxel
NVImage_t arValues[8]; // keep H.U. values of relevant voxels
for( int z = 0; z < y =" 0;" x =" 0;" z =" 0;" y =" 0;" x =" 0;"> arVoxels[8];
NVImage_t arValues[8]; // keep H.U. values of relevant voxels
We can see that for each z, there will be two threads. Hence, if we move declarations to the most outer loop, two threads will use the same variable storage, and that is not correct. What we should do is moving these declarations just one step outer as shown below:
#pragma omp parallel
for( int z = 0; z < y =" 0;"> arVoxels[8];
NVImage_t arValues[8]; // keep H.U. values of relevant voxels
for( int x = 0; x < nNewSizeX-2; x++ ) {
From my experiment, this makes things much faster since array initialization is expensive. Note: we can even reduce initialization further if we divide the loop ourselves and create 'multiple sections'. For each section, declare its own variables. I, however, will not do this since the code will be clutter, and we have to know the number of sections before hand, which is normally equal to the number of physical cores in a system. Thus, changing CPU may cause some performance issue. Using OpenMP in this way, nonetheless, is very flexible regarding to system configurations.
No comments:
Post a Comment