Find the answer to your Linux question:
Results 1 to 5 of 5
Hi all, Recently I was doing some comparisons on for loops between C and python and it turns out that C is a lot faster -- that was within my ...
  1. #1
    Linux User
    Join Date
    May 2008
    Location
    NYC, moved from KS & MO
    Posts
    251

    weird C loop performance comparision result

    Hi all,
    Recently I was doing some comparisons on for loops between C and python and it turns out that C is a lot faster -- that was within my expectation. However there's something that I couldn't figure out the reason. After the comparing C/Python, I did some comparisons between two simple C programs:
    Code:
    $ cat cloop.c
    #include<stdio.h>
    main() {
    register int i=0;
    for(i=0;i<999999999;i++)
    	;
    }
    Code:
    $ cat cloop2.c
    #include<stdio.h>
    main() {
    register int i=0;
    register int a=5;
    register int b;
    for(i=0;i<999999999;i++)
    	b=a+3;
    }
    In theory cloop should run faster than cloop2, but cloop2 is slightly faster than cloop, as my test results shows:
    Code:
    $ date; ./cloop; date
    Sat Jan 10 18:36:19 EST 2009
    Sat Jan 10 18:36:24 EST 2009
    --->5 seconds
    [ BTW on the same machine it took python 134 seconds to finish the same empty for loop ]
    
    $ date; ./cloop2; date
    Sat Jan 10 18:35:09 EST 2009
    Sat Jan 10 18:35:13 EST 2009
    --->4 seconds
    Also the testing results are pretty consistent, no matter how many times I run and whichever program runs first.

    If I changed the cycles to 1,999,999,999, I got the same running time on both programs ---> 8 seconds.

    The above tests were done on my AMD 64 x2 3800 (Ubuntu 8.10 686 kernel). No special optimisation is done on the compiling. Can someone tell me why the empty c loops is no faster than the one that does the calculation?

  2. #2
    drl
    drl is offline
    Linux Engineer drl's Avatar
    Join Date
    Apr 2006
    Location
    Saint Paul, MN, USA / CentOS, Debian, Solaris, SuSE
    Posts
    1,117
    Hi.

    You would need to look at the compiled code to be certain, but I'd guess that the analysis that the compiler did showed that the statement did not depend on the loop (invariant), so it optimized it out of the loop ... cheers, drl
    Welcome - get the most out of the forum by reading forum basics and guidelines: click here.
    90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
    We look forward to helping you with the challenge of the other 10%.
    ( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )

  3. #3
    Linux User
    Join Date
    May 2008
    Location
    NYC, moved from KS & MO
    Posts
    251
    Thanks drl. I ran a cmp between the compiled code and it turned out there's only one byte's difference and cloop2 is one byte bigger than cloop:
    Code:
    $ cmp cloop cloop2
    cloop cloop2 differ: byte 7305, line 3
    
    [ Out of curiosity, I used bvi to open both files and the content for byte 7305 is:
    cloop  -> FD
    cloop2 -> FE
    ]
    
    $ stat -c "%s %n" cloop cloop2
    9001 cloop
    9002 cloop2

  4. #4
    drl
    drl is offline
    Linux Engineer drl's Avatar
    Join Date
    Apr 2006
    Location
    Saint Paul, MN, USA / CentOS, Debian, Solaris, SuSE
    Posts
    1,117
    Hi.

    I think cmp stops at the first difference, so one should also look at the length, say with ls -l.

    You could also look at the assembly code, option -S (upper case) -- it's very short. I don't know much about Intel, but it looked like almost everything was optimized away.

    In another version, I placed print statements before and after the loop to force the compiler to not completely optimize the assignment away. The code is very similar, just a few instructions to store the result in the code that has the assignment statement.

    If you are interested, you could play with the optimization settings. You might find that the lowest level will leave the assignment in the loop.

    The man page suggests that one may be surprised by optimized code -- some variables will be completely omitted, etc.

    Best wishes ... cheers, drl
    Welcome - get the most out of the forum by reading forum basics and guidelines: click here.
    90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
    We look forward to helping you with the challenge of the other 10%.
    ( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )

  5. #5
    Linux User
    Join Date
    May 2008
    Location
    NYC, moved from KS & MO
    Posts
    251
    result of sizes using ls -l is identical to the stat command I used. I ran gcc again with the -S option and here's the output:
    assembly codes: ( vi cloop.s cloop2.s -O )
    Code:
        .file   "cloop.c"                     |    .file   "cloop2.c"
        .text                                 |    .text
    .globl main                               |.globl main
        .type   main, @function               |    .type   main, @function
    main:                                     |main:
        leal    4(%esp), %ecx                 |    leal    4(%esp), %ecx
        andl    $-16, %esp                    |    andl    $-16, %esp
        pushl   -4(%ecx)                      |    pushl   -4(%ecx)
        pushl   %ebp                          |    pushl   %ebp
        movl    %esp, %ebp                    |    movl    %esp, %ebp
        pushl   %ecx                          |    pushl   %ecx
        subl    $8, %esp                      |    subl    $8, %esp
        movl    $0, -8(%ebp)                  |    movl    $0, -8(%ebp)
        movl    $0, -8(%ebp)                  |    movl    $0, -8(%ebp)
        jmp .L2                               |    jmp .L2
    .L3:                                      |.L3:
        addl    $1, -8(%ebp)                  |    addl    $1, -8(%ebp)
    .L2:                                      |.L2:
        cmpl    $1999999998, -8(%ebp)         |    cmpl    $1999999998, -8(%ebp)
        jle .L3                               |    jle .L3
        addl    $8, %esp                      |    addl    $8, %esp
        popl    %ecx                          |    popl    %ecx
        popl    %ebp                          |    popl    %ebp
        leal    -4(%ecx), %esp                |    leal    -4(%ecx), %esp
        ret                                   |    ret
        .size   main, .-main                  |    .size   main, .-main
        .ident  "GCC: (Ubuntu 4.3.2-1ubuntu11)|    .ident  "GCC: (Ubuntu 4.3.2-1ubuntu11)
     4.3.2"                                   | 4.3.2"
        .section    .note.GNU-stack,"",@progbi|    .section    .note.GNU-stack,"",@progbi
    ts                                        |ts
    It shows the assembly codes are identical except the source code file names. It's interesting to see that the addition statement is not shown in the assembly code of cloop2. ( I think the .L3 line is just to increase the value of i, that is, it is actually doing i++)


    source codes ( vi cloop.c cloop2.c -O )
    Code:
    #include<stdio.h>                         |#include<stdio.h>
    main() {                                  |main() {
    register int i=0;                         |register int i=0;
    for(i=0;i<1999999999;i++)                 |register int a=5;
        ;                                     |register int b;
    }                                         |for(i=0;i<1999999999;i++)
    ~                                         |    b=a+3;
    ~                                         |}
    Thanks again.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...