c++ while loop versus For loop

**hod139** · June 20th, 2006

All,

Background: Some code I've been writing required a lightweight matrix/vector class with some basic linear algebra. After writing up something quick, I decided to compare the running time of my matrix-matrix multiply with Boost's ublas library. My code was slower and I was confused why as matrix-matrix multiplication is fairly straight forward. A colleague of mine suggested replacing the for loops with corresponding decrementing while loops. To my shock, using while loops improved my running time and it was as fast as Boost's. This is confusing to me, as I would have expected g++ to produce the same assembly for such a straight forward segment of code.

My friend didn't know why decrementing while loops were faster than incrementing for loops, but in all the vision code he works with he noticed that while loops were always used in preference to for loops. I did some quick googling, and the best I could come up with was that most CPUs tend to have better branching tools for >= 0 tests. But replacing the incrementing for loops with decrementing for loops still did not net the gain in performance found when using decrementing while loops.

Questions:
1. Does anyone here have a guess as to why g++ does not produce the same assembly code for the corresponding methods?
2. Why is the decrementing while loop implementation much faster than the incrementing for loops (at least for the first run)?
3. The subsequent improvements in running time for the FOR loop implementation when calculating an average time over 20 runs is probably due to caching; but the running time eventually overtakes that of the while loop! Why is the performance increase in the FOR loop over subsequent calls so large, eventually beating the while loop?

The methods:

Code:

Matrix<T> mult_FOR(const Matrix<T>& rhs) const{
    assert(nCol_ == rhs.size1() );     
    size_t rhs_size2 = rhs.size2();
    Matrix<T> theMatrix(nRow_, rhs_size2);
    size_t dataLoc;
    for (size_t r=0; r < nRow_; ++r) {
       for (size_t c=0 ; c < rhs_size2; ++c) {
          T& x = theMatrix(r, c);
          dataLoc = nCol_*r;
          for (size_t i=0; i < nCol_; ++i) { 
             x += data_[dataLoc++] * rhs(i,c);
         }
      }
   }
   return theMatrix;
}

Code:

Matrix<T> mult_WHILE(const Matrix<T>& rhs) const{
    assert(nCol_ == rhs.size1() );
    int rhs_size2 = rhs.size2();
    Matrix<T> theMatrix(nRow_, rhs_size2);
    int dataLoc;
    
    int r, c, i;
    r = nRow_ -1;
    do{
      c = rhs_size2 -1;
      do{
         T& x = theMatrix(r, c); 
         i = nCol_ -1;
         dataLoc = nCol_*r + i; 
         do{     
            x += data_[dataLoc--] * rhs(i,c);
            --i;
         } while(i >= 0);
      --c;
      } while (c >= 0);
      --r;
    } while(r >=0);
    return theMatrix;
}

and execution:

Code:

 g++ -O2 Post.cpp
./a.out
First time for mult_FOR (seconds): 2.76
Average time for mult_FOR (seconds): 2.0005
Deviation for mult_FOR (seconds): 0.81
First time for mult_WHILE (seconds): 2.28
Average time for mult_WHILE (seconds): 2.27
Deviation for mult_WHILE (seconds): 0.02

If anyone wants the entire file, I have attached it (I had to name it Post.txt as it would not let me upload Post.cpp). here: Post.txt

**thumper** · June 20th, 2006

Just a small point, but have you tried running the mult_WHILE first and the mult_FOR second? It could just be the initial paging.

Also, one difference in your code is that the while version uses int whereas your for version uses size_t. Depending on your platform, these may be different.

**hod139** · June 20th, 2006

Originally Posted by thumper

Just a small point, but have you tried running the mult_WHILE first and the mult_FOR second? It could just be the initial paging.

Thanks for this suggestion. When I switched the call order to mult_WHILE first, and mult_FOR second, the large deviation in time was now with mult_WHILE, so the initial paging must be the culprit for the time variation.

Also, one difference in your code is that the while version uses int whereas your for version uses size_t. Depending on your platform, these may be different.

I had changed the datatype to int in the decrementing while version since the values in the iteration had to go to -1. I forgot to change them in the FOR version for consistency, thanks for pointing that out.

Listening to your suggestions, I changed the methods to use consistent datatypes and I now ignore the timing results of the first call. The results are: (I was getting impatient so I decreased the size of the matrices being multiplied, which is why the time decreased):

Code:

Results when calling mult_WHILE first
Average time for mult_WHILE (seconds): 0.7535
Average time for mult_FOR (seconds): 0.5875
Results when calling mult_FOR first
Average time for mult_FOR (seconds): 0.597
Average time for mult_WHILE (seconds): 0.755

Now it would appear that the FOR loop implementation is faster!! The code that generated the above results is attached here: Post2.txt.

At least my question of timing deviations between the first run and subsequent runs is answered, but why is there still a large difference in time between the FOR loop implementation and the WHILE loop implementation?

**thumper** · June 20th, 2006

Many compilers have optimisations that they run over for loops as they are probably the most common looping construct that is used.

I have some style comments if you are interested in hearing them.

**hod139** · June 20th, 2006

Originally Posted by thumper

I have some style comments if you are interested in hearing them.

Not my original intent with this post, but yeah I'd like to hear your comments.

**thumper** · June 20th, 2006

1) Use typename instead of class - small thing but double isn't a class

Code:

template <typename T>
class Matrix
{
   //...
};

2) Put things in a namespace - I admit that this is just sample code, and the original may well be, but anyway, it is good practice.

3) Methods size1 and size2 could be more readable as row_size and col_size, or sum such.

4) Better to write operator= with a copy constructor and swap.

5) Initialiser lists are better practice than assignment in constructors.

6) Doing an assert after a new isn't going to give you anything because by default if new cannot allocate it throws an exception.

7) Perfer non-member non-friend functions.

Code:

template <typename T>
Matrix<T> operator*(Matrix<T> const& lhs, Matrix<T> const& rhs)

8) More descriptive variable names are good when coming back to read your own code.

9) Does it really make sense to have a default arg for the operator() ?

**hod139** · June 20th, 2006

Thumper, thanks for the feedback. I have posted responses below all of your suggestions.

Originally Posted by thumper

1) Use typename instead of class - small thing but double isn't a class

Code:

template <typename T>
class Matrix
{
   //...
};

To be honest I wrote class simply because it is shorter

. Is there a reason I should choose one over the other (other than clarifying the expected type as you noted).

2) Put things in a namespace - I admit that this is just sample code, and the original may well be, but anyway, it is good practice.

In actual code it is.

3) Methods size1 and size2 could be more readable as row_size and col_size, or sum such.

I agree completely, but Boost's ublas library used size1 and size2; and this class is meant to replace our dependence on ublas. To make porting easier, I simply used their naming convention.

4) Better to write operator= with a copy constructor and swap.

Guess I don't understand why with this point. My copy constructor only calls the private copy method, so why is operator= better calling the copy constructor and a swap?

5) Initialiser lists are better practice than assignment in constructors.

For primitive types does it matter? Why is it better practice?

6) Doing an assert after a new isn't going to give you anything because by default if new cannot allocate it throws an exception.

Thanks, must be leftover practice from my malloc days.

7) Perfer non-member non-friend functions.

Code:

template <typename T>
Matrix<T> operator*(Matrix<T> const& lhs, Matrix<T> const& rhs)

8) More descriptive variable names are good when coming back to read your own code.

Why prefer non-member non-friend functions over member functions? Why is a non-member operator* preferential to a member operator*

Code:

Matrix<T> operator*(const Matrix<T>& rhs) const{

9) Does it really make sense to have a default arg for the operator() ?

It was when I was using this class to also store vectors. Then I could access the element using a single param. Now that I also wrote a Vector class, you are correct in that the default is no longer needed.

**LordHunter317** · June 20th, 2006

Originally Posted by hod139

Guess I don't understand why with this point. My copy constructor only calls the private copy method, so why is operator= better calling the copy constructor and a swap?

Because calling copy constructor and swap automatically handles self-assignment correctly in the face of exceptions, which is why it used in 99% of cases.

Your function as written isn't exception safe. If copy() throws, your object will go into an undefined state.

For primitive types does it matter? Why is it better practice?

In practice it doesn't matter for primitives. The issue is that if you don't use an initalizer list, everything gets set a default value before the constructor body is run. For a primitive, this isn't a big deal. For other things, it is.

Why prefer non-member non-friend functions over member functions?

It improves encapsulation: a non-friend non-member function can only be dependent on your classes' public interface. This means that you can change class implementation without affecting them.

**hod139** · June 20th, 2006

Originally Posted by LordHunter317

It improves encapsulation: a non-friend non-member function can only be dependent on your classes' public interface. This means that you can change class implementation without affecting them.

But I can always change which methods are public in the class. Too me, using non-friend non-member functions seems to be the C way of doing things, where you have all these stucts laying around and accessor methods to them.

If the method is not going to alter the state of the class, you can declare it const, which gains the same protection as a non-friend non-member function.

**LordHunter317** · June 20th, 2006

Originally Posted by hod139

But I can always change which methods are public in the class.

Which is an API break, but sure. It doesn't change the fact that non-member, non-friend functions depend on fewer details then member functions or non-friend functions.

Basically, encapsulation is enhanced because there are fewer things you can change that would require you to change the calling function.

Too me, using non-friend non-member functions seems to be the C way of doing things,

They're clearly not. The STL uses them everywhere, as does the standard library.

where you have all these stucts laying around and accessor methods to them.

No, that's not what's being suggested. Accessor methods are dependent on private details, hence they should be member functions. But functions that only need the accessor methods aren't dependent on private details, hence they shouldn't be member functions.

If the method is not going to alter the state of the class, you can declare it const, which gains the same protection as a non-friend non-member function.

No, it does not. You're confusing mutability with encapsulation. I could have a non-member function that alters state (e.g., for_each()) but is still non-member. Const means the class state is immutable. Non-friend, non-member means I can only access public details.

Thread: c++ while loop versus For loop

Thread Tools

Display

c++ while loop versus For loop

Re: c++ while loop versus For loop

Re: c++ while loop versus For loop

Re: c++ while loop versus For loop

Re: c++ while loop versus For loop

Re: c++ while loop versus For loop

Re: c++ while loop versus For loop

Re: c++ while loop versus For loop

Re: c++ while loop versus For loop

Re: c++ while loop versus For loop

Bookmarks

Bookmarks

Posting Permissions