Results 1 to 5 of 5
Hi all,
Recently I was doing some comparisons on for loops between C and python and it turns out that C is a lot faster -- that was within my ...
- 01-10-2009 #1Linux User
- Join Date
- May 2008
- Location
- NYC, moved from KS & MO
- Posts
- 251
weird C loop performance comparision result
Hi all,
Recently I was doing some comparisons on for loops between C and python and it turns out that C is a lot faster -- that was within my expectation. However there's something that I couldn't figure out the reason. After the comparing C/Python, I did some comparisons between two simple C programs:
Code:$ cat cloop.c #include<stdio.h> main() { register int i=0; for(i=0;i<999999999;i++) ; }In theory cloop should run faster than cloop2, but cloop2 is slightly faster than cloop, as my test results shows:Code:$ cat cloop2.c #include<stdio.h> main() { register int i=0; register int a=5; register int b; for(i=0;i<999999999;i++) b=a+3; }
Also the testing results are pretty consistent, no matter how many times I run and whichever program runs first.Code:$ date; ./cloop; date Sat Jan 10 18:36:19 EST 2009 Sat Jan 10 18:36:24 EST 2009 --->5 seconds [ BTW on the same machine it took python 134 seconds to finish the same empty for loop ] $ date; ./cloop2; date Sat Jan 10 18:35:09 EST 2009 Sat Jan 10 18:35:13 EST 2009 --->4 seconds
If I changed the cycles to 1,999,999,999, I got the same running time on both programs ---> 8 seconds.
The above tests were done on my AMD 64 x2 3800 (Ubuntu 8.10 686 kernel). No special optimisation is done on the compiling. Can someone tell me why the empty c loops is no faster than the one that does the calculation?
- 01-10-2009 #2Linux Engineer
- Join Date
- Apr 2006
- Location
- Saint Paul, MN, USA / CentOS, Debian, Solaris, SuSE
- Posts
- 1,117
Hi.
You would need to look at the compiled code to be certain, but I'd guess that the analysis that the compiler did showed that the statement did not depend on the loop (invariant), so it optimized it out of the loop ... cheers, drlWelcome - get the most out of the forum by reading forum basics and guidelines: click here.
90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
We look forward to helping you with the challenge of the other 10%.
( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )
- 01-12-2009 #3Linux User
- Join Date
- May 2008
- Location
- NYC, moved from KS & MO
- Posts
- 251
Thanks drl. I ran a cmp between the compiled code and it turned out there's only one byte's difference and cloop2 is one byte bigger than cloop:
Code:$ cmp cloop cloop2 cloop cloop2 differ: byte 7305, line 3 [ Out of curiosity, I used bvi to open both files and the content for byte 7305 is: cloop -> FD cloop2 -> FE ] $ stat -c "%s %n" cloop cloop2 9001 cloop 9002 cloop2
- 01-12-2009 #4Linux Engineer
- Join Date
- Apr 2006
- Location
- Saint Paul, MN, USA / CentOS, Debian, Solaris, SuSE
- Posts
- 1,117
Hi.
I think cmp stops at the first difference, so one should also look at the length, say with ls -l.
You could also look at the assembly code, option -S (upper case) -- it's very short. I don't know much about Intel, but it looked like almost everything was optimized away.
In another version, I placed print statements before and after the loop to force the compiler to not completely optimize the assignment away. The code is very similar, just a few instructions to store the result in the code that has the assignment statement.
If you are interested, you could play with the optimization settings. You might find that the lowest level will leave the assignment in the loop.
The man page suggests that one may be surprised by optimized code -- some variables will be completely omitted, etc.
Best wishes ... cheers, drlWelcome - get the most out of the forum by reading forum basics and guidelines: click here.
90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
We look forward to helping you with the challenge of the other 10%.
( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )
- 01-12-2009 #5Linux User
- Join Date
- May 2008
- Location
- NYC, moved from KS & MO
- Posts
- 251
result of sizes using ls -l is identical to the stat command I used. I ran gcc again with the -S option and here's the output:
assembly codes: ( vi cloop.s cloop2.s -O )
It shows the assembly codes are identical except the source code file names. It's interesting to see that the addition statement is not shown in the assembly code of cloop2. ( I think the .L3 line is just to increase the value of i, that is, it is actually doing i++)Code:.file "cloop.c" | .file "cloop2.c" .text | .text .globl main |.globl main .type main, @function | .type main, @function main: |main: leal 4(%esp), %ecx | leal 4(%esp), %ecx andl $-16, %esp | andl $-16, %esp pushl -4(%ecx) | pushl -4(%ecx) pushl %ebp | pushl %ebp movl %esp, %ebp | movl %esp, %ebp pushl %ecx | pushl %ecx subl $8, %esp | subl $8, %esp movl $0, -8(%ebp) | movl $0, -8(%ebp) movl $0, -8(%ebp) | movl $0, -8(%ebp) jmp .L2 | jmp .L2 .L3: |.L3: addl $1, -8(%ebp) | addl $1, -8(%ebp) .L2: |.L2: cmpl $1999999998, -8(%ebp) | cmpl $1999999998, -8(%ebp) jle .L3 | jle .L3 addl $8, %esp | addl $8, %esp popl %ecx | popl %ecx popl %ebp | popl %ebp leal -4(%ecx), %esp | leal -4(%ecx), %esp ret | ret .size main, .-main | .size main, .-main .ident "GCC: (Ubuntu 4.3.2-1ubuntu11)| .ident "GCC: (Ubuntu 4.3.2-1ubuntu11) 4.3.2" | 4.3.2" .section .note.GNU-stack,"",@progbi| .section .note.GNU-stack,"",@progbi ts |ts
source codes ( vi cloop.c cloop2.c -O )
Thanks again.Code:#include<stdio.h> |#include<stdio.h> main() { |main() { register int i=0; |register int i=0; for(i=0;i<1999999999;i++) |register int a=5; ; |register int b; } |for(i=0;i<1999999999;i++) ~ | b=a+3; ~ |}


Reply With Quote