Find the answer to your Linux question:
Page 1 of 3 1 2 3 LastLast
Results 1 to 10 of 21
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1

    need help with pthread programming on an smp


    so i have been testing things out with programming on an smp (since i just got a new dual core system so i thought id check it out) so i made a test program to see if i will get a performance increase from using threads on my smp and i came to a most curious result, this is the code:
    Code:
    #include <iostream>
    #include <iomanip>
    #include <stdio.h>
    #include <fstream>
    #include <math.h>
    #include <pthread.h>
    #define printf(...) fprintf (stdout, __VA_ARGS__); fflush(stdout);
    using namespace std;
    
    struct hello_world
    {
    double first;
    double second;
    double result;
    };
    
    void *myfunc(void *value)
    {
    hello_world *myhello = (hello_world*)value;
    for (int counter = myhello->first; counter <= myhello->second; counter++) 
      {
        myhello->result += sqrt((double)counter);
      }
    } 
    
    int main()
    {
    	pthread_t thread1, thread2;
    	hello_world *myhello;
    	hello_world *nexthello;
    	myhello = new hello_world;
    	nexthello = new hello_world;
    	myhello->first = 1.0;
    	myhello->second = 500000000.0;
    	nexthello->first = 500000001.0;
    	nexthello->second = 1000000000.0;
    	printf("set second, launching thread\n");
    	pthread_create( &thread1, NULL, myfunc, (void*)myhello);
    	pthread_create( &thread2, NULL, myfunc, (void*)nexthello);
    	printf("waiting\n");
    	pthread_join(thread1,NULL);
    	pthread_join(thread2,NULL);
    	printf("%lf",(myhello->result+ nexthello->result));
    
    	printf("done\n");
    }
    as you can see this calculates the sum of the squares of all numbers from 1 to 1,000,000,000 (got the idea from the beowulf howto, thought it should apply to the smp too) now i ran this doing it with 1 thread and i got this:
    $ time ./a.out
    myhello made
    set first
    set second, launching thread
    waiting
    done

    real 0m18.808s
    user 0m18.013s
    sys 0m0.028s

    then i ran it with 2 threads, i expected to see around 9-13 or so but i got this:
    $ time ./a.out
    set second, launching thread
    waiting
    1052463001done

    real 0m32.932s
    user 0m58.192s
    sys 0m0.112s
    (the first one didnt output because i didnt really care what the result was..)
    this is nearly twice the amount of time to do the SAME calculation.

    i dont understand this. am i doing something wrong here? i expected it would be around half the time or at least equal to the time of a single thread...so why does running 2 threads take TWICE as long? do i have to program it to explicitly make use of both processors and even if its only running on 1 processor...why would it take twice as long?

  2. #2
    Quote Originally Posted by the_ultimate_samurai View Post
    so i have been testing things out with programming on an smp (since i just got a new dual core system so i thought id check it out) so i made a test program to see if i will get a performance increase from using threads on my smp and i came to a most curious result, this is the code:
    Code:
    #include <iostream>
    #include <iomanip>
    #include <stdio.h>
    #include <fstream>
    #include <math.h>
    #include <pthread.h>
    #define printf(...) fprintf (stdout, __VA_ARGS__); fflush(stdout);
    using namespace std;
    
    struct hello_world
    {
    double first;
    double second;
    double result;
    };
    
    void *myfunc(void *value)
    {
    hello_world *myhello = (hello_world*)value;
    for (int counter = myhello->first; counter <= myhello->second; counter++) 
      {
        myhello->result += sqrt((double)counter);
      }
    } 
    
    int main()
    {
    	pthread_t thread1, thread2;
    	hello_world *myhello;
    	hello_world *nexthello;
    	myhello = new hello_world;
    	nexthello = new hello_world;
    	myhello->first = 1.0;
    	myhello->second = 500000000.0;
    	nexthello->first = 500000001.0;
    	nexthello->second = 1000000000.0;
    	printf("set second, launching thread\n");
    	pthread_create( &thread1, NULL, myfunc, (void*)myhello);
    	pthread_create( &thread2, NULL, myfunc, (void*)nexthello);
    	printf("waiting\n");
    	pthread_join(thread1,NULL);
    	pthread_join(thread2,NULL);
    	printf("%lf",(myhello->result+ nexthello->result));
    
    	printf("done\n");
    }
    as you can see this calculates the sum of the squares of all numbers from 1 to 1,000,000,000 (got the idea from the beowulf howto, thought it should apply to the smp too) now i ran this doing it with 1 thread and i got this:
    $ time ./a.out
    myhello made
    set first
    set second, launching thread
    waiting
    done

    real 0m18.808s
    user 0m18.013s
    sys 0m0.028s

    then i ran it with 2 threads, i expected to see around 9-13 or so but i got this:
    $ time ./a.out
    set second, launching thread
    waiting
    1052463001done

    real 0m32.932s
    user 0m58.192s
    sys 0m0.112s
    (the first one didnt output because i didnt really care what the result was..)
    this is nearly twice the amount of time to do the SAME calculation.

    i dont understand this. am i doing something wrong here? i expected it would be around half the time or at least equal to the time of a single thread...so why does running 2 threads take TWICE as long? do i have to program it to explicitly make use of both processors and even if its only running on 1 processor...why would it take twice as long?
    If you want each thread to use a different cpu at the same time, you will need to add the following at the beginning of your main():
    pthread_attr_t attr;
    pthread_attr_init(&attr);
    pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);

    and modify your pthread_create() calls to
    pthread_create( &thread1, &attr, myfunc, (void*)myhello);
    pthread_create( &thread2, &attr, myfunc, (void*)nexthello);


    You may check whether only a single CPU is used when your program is running by executing the command "sar -P ALL 1 10" (on a separate window while your program is running).

    Hope this helps.
    -Steve

  3. #3
    well i had since added this to the function:
    Code:
    void *myfunc(void *value)
    {
    cpu_set_t mask; 
    hello_world *myhello = (hello_world*)value;
    CPU_ZERO(&mask);
    CPU_SET(myhello->affinity,&mask);
    sched_setaffinity(pthread_self(), sizeof(mask), &mask);
    printf("first is: %lf, second is %lf, checking if affinity is correct, affinity should be: %d is it: %d\n", myhello->first, myhello->second, myhello->affinity, CPU_ISSET(myhello->affinity, &mask));
    for (double counter = myhello->first; counter <= myhello->second; counter++) 
      {
        myhello->result += sqrt((double)counter);
      }
    printf("done with %d\n", myhello->affinity);
    }
    each thread has a different number for affinity (0 and 1) and when i check my ksysguard while its running i see that the user% goes from ~8-10 for each core to 98-100% for each core...which certainly indicates that it is running on both cores.
    though supposing it is only running on one core why would using 2 threads run twice as long as running 1 thread?

    i will give that a try but im still wondering why this would be the case..

    --edit--
    ok i tested what you said...no change...if anything it took a few seconds longer...

    --edit2--
    it seems even without explicity setting affinity the kernel automatically load balances to use both cpus.

    --edit3--
    i tested and confirmed that using only 1 thread DOES use only one core...so when using two i AM using 2 cores. (fork() doesnt help either...so its not a memory issue...)

    --edit--
    to further confound the issue, a recent test has given me even more wierd results...i split the program into 2 programs, each program doing 1/2 the work (basically a program doing what the threads already do) and then i ran it with this script:
    Code:
    #!/bin/bash
    #testing.sh
    ./smp1& ./smp2; echo "done";
    and ran
    Code:
    time ./testing.sh
    and got this:
    Code:
    real    0m8.241s
    user    0m8.049s
    sys     0m0.012s
    just what i expected to see from my multi-threaded program.
    when i run the multi-threaded program but make sure that all threads run on 1 cpu i get 16 seconds, the same as a single threaded program, but when running on both cpus i get 34...

    what the hell is going on?

  4. $spacer_open
    $spacer_close
  5. #4
    From your description, my guess was that your program was using only one CPU at a time (while it might be switching from one CPU to another during the course of your program execution).
    So if you use "sar -P ALL 1 100" to monitor your program execution, you might see something like(assuming you have only 2 CPUs):

    06:48:06 AM CPU %user
    06:48:07 AM all 50 (<- average of 99 and 0)
    06:48:07 AM 0 99
    06:48:07 AM 1 0
    or
    06:48:06 AM CPU %user
    06:48:07 AM all 50
    06:48:07 AM 0 0
    06:48:07 AM 1 99

    but never:
    06:48:06 AM CPU %user
    06:48:07 AM all 99
    06:48:07 AM 0 98
    06:48:07 AM 1 99

    This is only my guess. We need to verify whether this is case. Appreciate if you could post the "sar" output . Thanks.

    -Steve

  6. #5
    yeah....i didnt get that...the command you gave gave me a syntax error (the -P wouldnt accept ALL so i used -A)
    i get this:
    21:38:04 cpu %usr %nice %sys %irq %softirq %wait %idle _cpu_
    21:38:05 all 100 0 0 0 0 0 0
    0 100 0 0 0 0 0 0
    1 100 0 0 0 0 0 0

    --edit--
    also, i tested that the cpu affinity works, i set both threads with an affinity for cpu 0, and they only ran on 0, this is running with one having an affinity for ONLY 0 and the other having affinity for ONLY 1. so it shouldnt be able to switch around them.

    each certainly seems to be working on sepperate cpus...

  7. #6
    that is the sar output there...anything useful?

  8. #7
    Assuming we are seeing 100% utilization of both cpu's through out your multi-thread program execution, I would suggest to use "oprofile" to investigate further.

    If you haven't installed "oprofile" already, you may download the appropriate .rpm from internet for your Linux distribution and install.

    I am using openSuSe and I have "oprofile-0.9.4-50.11" (checked with "rpm -qa | grep oprofile") installed.

    I have prepared a test script using your program. Please run it after you have installed oprofile.
    Here's the script "profile.sh":

    NUMOTHREAD=2
    [ $# -eq 1 ] && [ $1 -eq 1 ] && NUMOTHREAD=1
    TESTDIR=/tmp
    PROG=forum
    rm -f $TESTDIR/$PROG




    # create test program source file
    cat > $TESTDIR/$PROG.c <<EOF
    #include <stdio.h>
    #include <math.h>
    #include <pthread.h>
    using namespace std;


    struct hello_world
    {
    double first;
    double second;
    double result;
    };

    void *myfunc(void *value)
    {
    hello_world *myhello = (hello_world*)value;
    for (int counter = myhello->first; counter <= myhello->second; counter++)
    {
    myhello->result += sqrt((double)counter);
    }
    }

    int main()
    {
    pthread_attr_t attr;
    pthread_attr_init(&attr);
    pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
    pthread_t thread1, thread2;
    hello_world *myhello;
    hello_world *nexthello;
    myhello = new hello_world;
    nexthello = new hello_world;
    myhello->first = 1.0;
    myhello->second = 500000000.0;
    nexthello->first = 500000001.0;
    nexthello->second = 1000000000.0;
    EOF

    if [ $NUMOTHREAD -eq 1 ]
    then
    cat >> $TESTDIR/$PROG.c <<EOF1
    pthread_create( &thread1, &attr, myfunc, (void*)myhello);
    pthread_join(thread1,NULL);
    }
    EOF1

    else
    cat >> $TESTDIR/$PROG.c <<EOF2
    pthread_create( &thread1, &attr, myfunc, (void*)myhello);
    pthread_create( &thread2, &attr, myfunc, (void*)nexthello);
    pthread_join(thread1,NULL);
    pthread_join(thread2,NULL);
    }
    EOF2
    fi


    # Compile
    g++ -g -lpthread -o $TESTDIR/forum $TESTDIR/$PROG.c

    # Profile setup and start
    /usr/bin/opcontrol --reset
    #/usr/bin/opcontrol --setup --no-vmlinux --separate=library --event=GLOBAL_POWER_EVENTS:750000:0x1:1:1 --separate=thread
    /usr/bin/opcontrol --setup --separate=library,thread --event=GLOBAL_POWER_EVENTS:750000:0x1:1:1
    /usr/bin/opcontrol --start
    sar 1 10 > $TESTDIR/sar$NUMOTHREAD.out &

    # execute
    /usr/bin/time -o $TESTDIR/profile$NUMOTHREAD.out $TESTDIR/$PROG

    /usr/bin/opcontrol --shutdown
    /usr/bin/opreport --long-filenames image:$TESTDIR/$PROG --threshold 1 >> $TESTDIR/profile$NUMOTHREAD.out
    opannotate image:$TESTDIR/$PROG --source >> $TESTDIR/profile$NUMOTHREAD.out
    cat $TESTDIR/sar$NUMOTHREAD.out >> $TESTDIR/profile$NUMOTHREAD.out
    Please perform the following:
    1. ./profile.sh 1
    2. ./profile.sh 2
    3. post /tmp/profile1.out and /tmp/profile2.out in your reply

    Hopefully this might give us some hints.
    -Steve

  9. #8
    fyi, here's my profile2.out when I execute "./profile.sh 2" on my single cpu system:

    38.73user 0.04system 0:38.90elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+309minor)pagefaults 0swaps
    CPU: P4 / Xeon, speed 3065.79 MHz (estimated)
    Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 750000
    Processes with a thread ID of 5503
    Processes with a thread ID of 5504
    Processes with a thread ID of 5505
    tid:5503| tid:5504| tid:5505|
    samples| %| samples| %| samples| %|
    ------------------------------------------------------
    6 100.000 76749 100.000 76802 100.000 /tmp/forum
    tid:5503| tid:5504| tid:5505|
    samples| %| samples| %| samples| %|
    ------------------------------------------------------
    6 100.000 0 0 0 0 /lib64/ld-2.8.so
    0 0 76749 100.000 76802 100.000 /tmp/forum
    /*
    * Command line: opannotate image:/tmp/forum --source
    *
    * Interpretation of command line:
    * Output annotated source file with samples
    * Output all files
    *
    * CPU: P4 / Xeon, speed 3065.79 MHz (estimated)
    * Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 750000
    * Processes with a thread ID of 5503
    * Processes with a thread ID of 5504
    * Processes with a thread ID of 5505
    */
    /*
    * Total samples for file : "/tmp/forum.c"
    *
    * 0 0 76749 100.000 76802 100.000
    */


    :#include <stdio.h>
    :#include <math.h>
    :#include <pthread.h>
    :using namespace std;
    :
    :
    :struct hello_world
    :{
    :double first;
    :double second;
    :double result;
    :};
    :
    :void *myfunc(void *value) /* myfunc(void*) total: 0 0 76749 100.000 76802 100.000 */
    :{
    :hello_world *myhello = (hello_world*)value;
    0 0 10522 13.7096 10124 13.1819 :for (int counter = myhello->first; counter <= myhello->second; counter++)
    : {
    0 0 66227 86.2904 66678 86.8181 : myhello->result += sqrt((double)counter);
    : }
    :}
    :
    :int main()
    :{
    : pthread_attr_t attr;
    : pthread_attr_init(&attr);
    : pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
    : pthread_t thread1, thread2;
    : hello_world *myhello;
    : hello_world *nexthello;
    : myhello = new hello_world;
    : nexthello = new hello_world;
    : myhello->first = 1.0;
    : myhello->second = 500000000.0;
    : nexthello->first = 500000001.0;
    : nexthello->second = 1000000000.0;
    : pthread_create( &thread1, &attr, myfunc, (void*)myhello);
    : pthread_create( &thread2, &attr, myfunc, (void*)nexthello);
    : pthread_join(thread1,NULL);
    : pthread_join(thread2,NULL);
    :}
    Linux 2.6.25.5-1.1-default (linux-2biz) 05/28/09 _x86_64_

    09:11:54 CPU %user %nice %system %iowait %steal %idle
    09:11:55 all 99.00 0.00 1.00 0.00 0.00 0.00
    09:11:56 all 100.00 0.00 0.00 0.00 0.00 0.00
    09:11:57 all 100.00 0.00 0.00 0.00 0.00 0.00
    09:11:58 all 99.01 0.00 0.99 0.00 0.00 0.00
    09:11:59 all 100.00 0.00 0.00 0.00 0.00 0.00
    09:12:00 all 99.00 0.00 1.00 0.00 0.00 0.00
    09:12:01 all 99.01 0.00 0.99 0.00 0.00 0.00
    09:12:02 all 100.00 0.00 0.00 0.00 0.00 0.00
    09:12:03 all 100.00 0.00 0.00 0.00 0.00 0.00
    09:12:04 all 99.01 0.00 0.99 0.00 0.00 0.00
    Average: all 99.50 0.00 0.50 0.00 0.00 0.00

  10. #9
    1:
    Code:
    9.04user 0.00system 0:09.10elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+255minor)pagefaults 0swaps
    
    Linux  debian  2.6.26-2-686  #1 SMP Thu Mar 26 01:08:11 UTC 2009  i686  05/28/2009
    
    02:57:31  cpu %usr %nice   %sys %irq %softirq    %wait %idle             _cpu_
    02:57:31  all   62     0      1    0        0        1    36
                0   22     0      3    0        0        1    74
                1  100     0      0    0        0        0     0
    02:57:32  all   51     0      0    0        0        0    48
                0    2     0      1    0        1        0    96
                1  100     0      0    0        0        0     0
    02:57:33  all   53     0      0    0        0        0    46
                0    7     0      1    0        0        0    92
                1  100     0      0    0        0        0     0
    02:57:34  all   54     0      1    0        0        1    44
                0    8     0      3    1        0        2    86
                1  100     0      0    0        0        0     0
    02:57:35  all   55     0      2    0        0        0    44
                0   10     0      3    0        0        0    87
                1  100     0      0    0        0        0     0
    02:57:36  all   55     0      0    0        0        0    44
                0   10     0      0    0        0        0    90
                1  100     0      0    0        0        0     0
    02:57:37  all   55     0      0    0        0        0    44
                0   10     0      2    0        0        1    87
                1  100     0      0    0        0        0     0
    02:57:38  all   54     0      2    0        0        0    44
                0    7     0      4    0        1        0    88
                1  100     0      0    0        0        0     0
    02:57:39  all   52     0      1    0        0        1    46
                0    6     0      2    0        0        2    90
                1   98     0      1    0        0        0     1
    02:57:40  all   26     0      5    0        0        0    69
                0    8     0      6    0        1        0    85
                1   44     0      3    0        0        0    53
    2:
    Code:
    61.75user 0.14system 0:33.21elapsed 186%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+255minor)pagefaults 0swaps
    
    Linux  debian  2.6.26-2-686  #1 SMP Thu Mar 26 01:08:11 UTC 2009  i686  05/28/2009
    
    02:57:43  cpu %usr %nice   %sys %irq %softirq    %wait %idle             _cpu_
    02:57:43  all   99     0      2    0        0        0     0
                0   99     0      3    0        0        0     0
                1  100     0      0    0        0        0     0
    02:57:44  all   99     0      0    0        0        0     0
                0   99     0      0    0        1        0     0
                1   99     0      1    0        0        0     0
    02:57:45  all  100     0      0    0        0        0     0
                0   99     0      1    0        0        0     0
                1  100     0      1    0        0        0     0
    02:57:46  all   99     0      0    0        0        0     0
                0   99     0      0    1        1        0     0
                1  100     0      0    0        0        0     0
    02:57:47  all  100     0      0    0        0        0     0
                0  100     0      0    0        0        0     0
                1   99     0      0    0        0        0     1
    02:57:48  all   99     0      1    0        0        0     0
                0   98     0      2    0        0        0     0
                1  100     0      1    0        0        0     0
    02:57:49  all  100     0      0    0        0        0     0
                0  100     0      0    0        0        0     0
                1  100     0      0    0        0        0     0
    02:57:50  all   99     0      1    0        0        0     0
                0   99     0      0    0        0        0     1
                1   99     0      1    0        0        0     0
    02:57:51  all   98     0      2    0        0        0     0
                0   99     0      1    0        0        0     0
                1   96     0      4    0        0        0     0
    02:57:52  all  100     0      0    0        0        0     0
                0   99     0      1    0        0        0     0
                1  100     0      0    0        0        0     0

  11. #10
    There is no profiling information in your outputs.
    Have you installed oprofile? Did you get any error messages when you execute .profile.sh?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •