Find the answer to your Linux question:
Page 1 of 3 1 2 3 LastLast
Results 1 to 10 of 21
so i have been testing things out with programming on an smp (since i just got a new dual core system so i thought id check it out) so i ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Dec 2006
    Posts
    85

    need help with pthread programming on an smp


    so i have been testing things out with programming on an smp (since i just got a new dual core system so i thought id check it out) so i made a test program to see if i will get a performance increase from using threads on my smp and i came to a most curious result, this is the code:
    Code:
    #include <iostream>
    #include <iomanip>
    #include <stdio.h>
    #include <fstream>
    #include <math.h>
    #include <pthread.h>
    #define printf(...) fprintf (stdout, __VA_ARGS__); fflush(stdout);
    using namespace std;
    
    struct hello_world
    {
    double first;
    double second;
    double result;
    };
    
    void *myfunc(void *value)
    {
    hello_world *myhello = (hello_world*)value;
    for (int counter = myhello->first; counter <= myhello->second; counter++) 
      {
        myhello->result += sqrt((double)counter);
      }
    } 
    
    int main()
    {
    	pthread_t thread1, thread2;
    	hello_world *myhello;
    	hello_world *nexthello;
    	myhello = new hello_world;
    	nexthello = new hello_world;
    	myhello->first = 1.0;
    	myhello->second = 500000000.0;
    	nexthello->first = 500000001.0;
    	nexthello->second = 1000000000.0;
    	printf("set second, launching thread\n");
    	pthread_create( &thread1, NULL, myfunc, (void*)myhello);
    	pthread_create( &thread2, NULL, myfunc, (void*)nexthello);
    	printf("waiting\n");
    	pthread_join(thread1,NULL);
    	pthread_join(thread2,NULL);
    	printf("%lf",(myhello->result+ nexthello->result));
    
    	printf("done\n");
    }
    as you can see this calculates the sum of the squares of all numbers from 1 to 1,000,000,000 (got the idea from the beowulf howto, thought it should apply to the smp too) now i ran this doing it with 1 thread and i got this:
    $ time ./a.out
    myhello made
    set first
    set second, launching thread
    waiting
    done

    real 0m18.808s
    user 0m18.013s
    sys 0m0.028s

    then i ran it with 2 threads, i expected to see around 9-13 or so but i got this:
    $ time ./a.out
    set second, launching thread
    waiting
    1052463001done

    real 0m32.932s
    user 0m58.192s
    sys 0m0.112s
    (the first one didnt output because i didnt really care what the result was..)
    this is nearly twice the amount of time to do the SAME calculation.

    i dont understand this. am i doing something wrong here? i expected it would be around half the time or at least equal to the time of a single thread...so why does running 2 threads take TWICE as long? do i have to program it to explicitly make use of both processors and even if its only running on 1 processor...why would it take twice as long?

  2. #2
    Just Joined!
    Join Date
    Jun 2008
    Posts
    34
    Quote Originally Posted by the_ultimate_samurai View Post
    so i have been testing things out with programming on an smp (since i just got a new dual core system so i thought id check it out) so i made a test program to see if i will get a performance increase from using threads on my smp and i came to a most curious result, this is the code:
    Code:
    #include <iostream>
    #include <iomanip>
    #include <stdio.h>
    #include <fstream>
    #include <math.h>
    #include <pthread.h>
    #define printf(...) fprintf (stdout, __VA_ARGS__); fflush(stdout);
    using namespace std;
    
    struct hello_world
    {
    double first;
    double second;
    double result;
    };
    
    void *myfunc(void *value)
    {
    hello_world *myhello = (hello_world*)value;
    for (int counter = myhello->first; counter <= myhello->second; counter++) 
      {
        myhello->result += sqrt((double)counter);
      }
    } 
    
    int main()
    {
    	pthread_t thread1, thread2;
    	hello_world *myhello;
    	hello_world *nexthello;
    	myhello = new hello_world;
    	nexthello = new hello_world;
    	myhello->first = 1.0;
    	myhello->second = 500000000.0;
    	nexthello->first = 500000001.0;
    	nexthello->second = 1000000000.0;
    	printf("set second, launching thread\n");
    	pthread_create( &thread1, NULL, myfunc, (void*)myhello);
    	pthread_create( &thread2, NULL, myfunc, (void*)nexthello);
    	printf("waiting\n");
    	pthread_join(thread1,NULL);
    	pthread_join(thread2,NULL);
    	printf("%lf",(myhello->result+ nexthello->result));
    
    	printf("done\n");
    }
    as you can see this calculates the sum of the squares of all numbers from 1 to 1,000,000,000 (got the idea from the beowulf howto, thought it should apply to the smp too) now i ran this doing it with 1 thread and i got this:
    $ time ./a.out
    myhello made
    set first
    set second, launching thread
    waiting
    done

    real 0m18.808s
    user 0m18.013s
    sys 0m0.028s

    then i ran it with 2 threads, i expected to see around 9-13 or so but i got this:
    $ time ./a.out
    set second, launching thread
    waiting
    1052463001done

    real 0m32.932s
    user 0m58.192s
    sys 0m0.112s
    (the first one didnt output because i didnt really care what the result was..)
    this is nearly twice the amount of time to do the SAME calculation.

    i dont understand this. am i doing something wrong here? i expected it would be around half the time or at least equal to the time of a single thread...so why does running 2 threads take TWICE as long? do i have to program it to explicitly make use of both processors and even if its only running on 1 processor...why would it take twice as long?
    If you want each thread to use a different cpu at the same time, you will need to add the following at the beginning of your main():
    pthread_attr_t attr;
    pthread_attr_init(&attr);
    pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);

    and modify your pthread_create() calls to
    pthread_create( &thread1, &attr, myfunc, (void*)myhello);
    pthread_create( &thread2, &attr, myfunc, (void*)nexthello);


    You may check whether only a single CPU is used when your program is running by executing the command "sar -P ALL 1 10" (on a separate window while your program is running).

    Hope this helps.
    -Steve

  3. #3
    Just Joined!
    Join Date
    Dec 2006
    Posts
    85
    well i had since added this to the function:
    Code:
    void *myfunc(void *value)
    {
    cpu_set_t mask; 
    hello_world *myhello = (hello_world*)value;
    CPU_ZERO(&mask);
    CPU_SET(myhello->affinity,&mask);
    sched_setaffinity(pthread_self(), sizeof(mask), &mask);
    printf("first is: %lf, second is %lf, checking if affinity is correct, affinity should be: %d is it: %d\n", myhello->first, myhello->second, myhello->affinity, CPU_ISSET(myhello->affinity, &mask));
    for (double counter = myhello->first; counter <= myhello->second; counter++) 
      {
        myhello->result += sqrt((double)counter);
      }
    printf("done with %d\n", myhello->affinity);
    }
    each thread has a different number for affinity (0 and 1) and when i check my ksysguard while its running i see that the user% goes from ~8-10 for each core to 98-100% for each core...which certainly indicates that it is running on both cores.
    though supposing it is only running on one core why would using 2 threads run twice as long as running 1 thread?

    i will give that a try but im still wondering why this would be the case..

    --edit--
    ok i tested what you said...no change...if anything it took a few seconds longer...

    --edit2--
    it seems even without explicity setting affinity the kernel automatically load balances to use both cpus.

    --edit3--
    i tested and confirmed that using only 1 thread DOES use only one core...so when using two i AM using 2 cores. (fork() doesnt help either...so its not a memory issue...)

    --edit--
    to further confound the issue, a recent test has given me even more wierd results...i split the program into 2 programs, each program doing 1/2 the work (basically a program doing what the threads already do) and then i ran it with this script:
    Code:
    #!/bin/bash
    #testing.sh
    ./smp1& ./smp2; echo "done";
    and ran
    Code:
    time ./testing.sh
    and got this:
    Code:
    real    0m8.241s
    user    0m8.049s
    sys     0m0.012s
    just what i expected to see from my multi-threaded program.
    when i run the multi-threaded program but make sure that all threads run on 1 cpu i get 16 seconds, the same as a single threaded program, but when running on both cpus i get 34...

    what the hell is going on?

  4. #4
    Just Joined!
    Join Date
    Jun 2008
    Posts
    34
    From your description, my guess was that your program was using only one CPU at a time (while it might be switching from one CPU to another during the course of your program execution).
    So if you use "sar -P ALL 1 100" to monitor your program execution, you might see something like(assuming you have only 2 CPUs):

    06:48:06 AM CPU %user
    06:48:07 AM all 50 (<- average of 99 and 0)
    06:48:07 AM 0 99
    06:48:07 AM 1 0
    or
    06:48:06 AM CPU %user
    06:48:07 AM all 50
    06:48:07 AM 0 0
    06:48:07 AM 1 99

    but never:
    06:48:06 AM CPU %user
    06:48:07 AM all 99
    06:48:07 AM 0 98
    06:48:07 AM 1 99

    This is only my guess. We need to verify whether this is case. Appreciate if you could post the "sar" output . Thanks.

    -Steve

  5. #5
    Just Joined!
    Join Date
    Dec 2006
    Posts
    85
    yeah....i didnt get that...the command you gave gave me a syntax error (the -P wouldnt accept ALL so i used -A)
    i get this:
    21:38:04 cpu %usr %nice %sys %irq %softirq %wait %idle _cpu_
    21:38:05 all 100 0 0 0 0 0 0
    0 100 0 0 0 0 0 0
    1 100 0 0 0 0 0 0

    --edit--
    also, i tested that the cpu affinity works, i set both threads with an affinity for cpu 0, and they only ran on 0, this is running with one having an affinity for ONLY 0 and the other having affinity for ONLY 1. so it shouldnt be able to switch around them.

    each certainly seems to be working on sepperate cpus...

  6. #6
    Just Joined!
    Join Date
    Dec 2006
    Posts
    85
    that is the sar output there...anything useful?

  7. #7
    Just Joined!
    Join Date
    Jun 2008
    Posts
    34
    Assuming we are seeing 100% utilization of both cpu's through out your multi-thread program execution, I would suggest to use "oprofile" to investigate further.

    If you haven't installed "oprofile" already, you may download the appropriate .rpm from internet for your Linux distribution and install.

    I am using openSuSe and I have "oprofile-0.9.4-50.11" (checked with "rpm -qa | grep oprofile") installed.

    I have prepared a test script using your program. Please run it after you have installed oprofile.
    Here's the script "profile.sh":

    NUMOTHREAD=2
    [ $# -eq 1 ] && [ $1 -eq 1 ] && NUMOTHREAD=1
    TESTDIR=/tmp
    PROG=forum
    rm -f $TESTDIR/$PROG




    # create test program source file
    cat > $TESTDIR/$PROG.c <<EOF
    #include <stdio.h>
    #include <math.h>
    #include <pthread.h>
    using namespace std;


    struct hello_world
    {
    double first;
    double second;
    double result;
    };

    void *myfunc(void *value)
    {
    hello_world *myhello = (hello_world*)value;
    for (int counter = myhello->first; counter <= myhello->second; counter++)
    {
    myhello->result += sqrt((double)counter);
    }
    }

    int main()
    {
    pthread_attr_t attr;
    pthread_attr_init(&attr);
    pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
    pthread_t thread1, thread2;
    hello_world *myhello;
    hello_world *nexthello;
    myhello = new hello_world;
    nexthello = new hello_world;
    myhello->first = 1.0;
    myhello->second = 500000000.0;
    nexthello->first = 500000001.0;
    nexthello->second = 1000000000.0;
    EOF

    if [ $NUMOTHREAD -eq 1 ]
    then
    cat >> $TESTDIR/$PROG.c <<EOF1
    pthread_create( &thread1, &attr, myfunc, (void*)myhello);
    pthread_join(thread1,NULL);
    }
    EOF1

    else
    cat >> $TESTDIR/$PROG.c <<EOF2
    pthread_create( &thread1, &attr, myfunc, (void*)myhello);
    pthread_create( &thread2, &attr, myfunc, (void*)nexthello);
    pthread_join(thread1,NULL);
    pthread_join(thread2,NULL);
    }
    EOF2
    fi


    # Compile
    g++ -g -lpthread -o $TESTDIR/forum $TESTDIR/$PROG.c

    # Profile setup and start
    /usr/bin/opcontrol --reset
    #/usr/bin/opcontrol --setup --no-vmlinux --separate=library --event=GLOBAL_POWER_EVENTS:750000:0x1:1:1 --separate=thread
    /usr/bin/opcontrol --setup --separate=library,thread --event=GLOBAL_POWER_EVENTS:750000:0x1:1:1
    /usr/bin/opcontrol --start
    sar 1 10 > $TESTDIR/sar$NUMOTHREAD.out &

    # execute
    /usr/bin/time -o $TESTDIR/profile$NUMOTHREAD.out $TESTDIR/$PROG

    /usr/bin/opcontrol --shutdown
    /usr/bin/opreport --long-filenames image:$TESTDIR/$PROG --threshold 1 >> $TESTDIR/profile$NUMOTHREAD.out
    opannotate image:$TESTDIR/$PROG --source >> $TESTDIR/profile$NUMOTHREAD.out
    cat $TESTDIR/sar$NUMOTHREAD.out >> $TESTDIR/profile$NUMOTHREAD.out
    Please perform the following:
    1. ./profile.sh 1
    2. ./profile.sh 2
    3. post /tmp/profile1.out and /tmp/profile2.out in your reply

    Hopefully this might give us some hints.
    -Steve

  8. #8
    Just Joined!
    Join Date
    Jun 2008
    Posts
    34
    fyi, here's my profile2.out when I execute "./profile.sh 2" on my single cpu system:

    38.73user 0.04system 0:38.90elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+309minor)pagefaults 0swaps
    CPU: P4 / Xeon, speed 3065.79 MHz (estimated)
    Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 750000
    Processes with a thread ID of 5503
    Processes with a thread ID of 5504
    Processes with a thread ID of 5505
    tid:5503| tid:5504| tid:5505|
    samples| %| samples| %| samples| %|
    ------------------------------------------------------
    6 100.000 76749 100.000 76802 100.000 /tmp/forum
    tid:5503| tid:5504| tid:5505|
    samples| %| samples| %| samples| %|
    ------------------------------------------------------
    6 100.000 0 0 0 0 /lib64/ld-2.8.so
    0 0 76749 100.000 76802 100.000 /tmp/forum
    /*
    * Command line: opannotate image:/tmp/forum --source
    *
    * Interpretation of command line:
    * Output annotated source file with samples
    * Output all files
    *
    * CPU: P4 / Xeon, speed 3065.79 MHz (estimated)
    * Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 750000
    * Processes with a thread ID of 5503
    * Processes with a thread ID of 5504
    * Processes with a thread ID of 5505
    */
    /*
    * Total samples for file : "/tmp/forum.c"
    *
    * 0 0 76749 100.000 76802 100.000
    */


    :#include <stdio.h>
    :#include <math.h>
    :#include <pthread.h>
    :using namespace std;
    :
    :
    :struct hello_world
    :{
    :double first;
    :double second;
    :double result;
    :};
    :
    :void *myfunc(void *value) /* myfunc(void*) total: 0 0 76749 100.000 76802 100.000 */
    :{
    :hello_world *myhello = (hello_world*)value;
    0 0 10522 13.7096 10124 13.1819 :for (int counter = myhello->first; counter <= myhello->second; counter++)
    : {
    0 0 66227 86.2904 66678 86.8181 : myhello->result += sqrt((double)counter);
    : }
    :}
    :
    :int main()
    :{
    : pthread_attr_t attr;
    : pthread_attr_init(&attr);
    : pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
    : pthread_t thread1, thread2;
    : hello_world *myhello;
    : hello_world *nexthello;
    : myhello = new hello_world;
    : nexthello = new hello_world;
    : myhello->first = 1.0;
    : myhello->second = 500000000.0;
    : nexthello->first = 500000001.0;
    : nexthello->second = 1000000000.0;
    : pthread_create( &thread1, &attr, myfunc, (void*)myhello);
    : pthread_create( &thread2, &attr, myfunc, (void*)nexthello);
    : pthread_join(thread1,NULL);
    : pthread_join(thread2,NULL);
    :}
    Linux 2.6.25.5-1.1-default (linux-2biz) 05/28/09 _x86_64_

    09:11:54 CPU %user %nice %system %iowait %steal %idle
    09:11:55 all 99.00 0.00 1.00 0.00 0.00 0.00
    09:11:56 all 100.00 0.00 0.00 0.00 0.00 0.00
    09:11:57 all 100.00 0.00 0.00 0.00 0.00 0.00
    09:11:58 all 99.01 0.00 0.99 0.00 0.00 0.00
    09:11:59 all 100.00 0.00 0.00 0.00 0.00 0.00
    09:12:00 all 99.00 0.00 1.00 0.00 0.00 0.00
    09:12:01 all 99.01 0.00 0.99 0.00 0.00 0.00
    09:12:02 all 100.00 0.00 0.00 0.00 0.00 0.00
    09:12:03 all 100.00 0.00 0.00 0.00 0.00 0.00
    09:12:04 all 99.01 0.00 0.99 0.00 0.00 0.00
    Average: all 99.50 0.00 0.50 0.00 0.00 0.00

  9. #9
    Just Joined!
    Join Date
    Dec 2006
    Posts
    85
    1:
    Code:
    9.04user 0.00system 0:09.10elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+255minor)pagefaults 0swaps
    
    Linux  debian  2.6.26-2-686  #1 SMP Thu Mar 26 01:08:11 UTC 2009  i686  05/28/2009
    
    02:57:31  cpu %usr %nice   %sys %irq %softirq    %wait %idle             _cpu_
    02:57:31  all   62     0      1    0        0        1    36
                0   22     0      3    0        0        1    74
                1  100     0      0    0        0        0     0
    02:57:32  all   51     0      0    0        0        0    48
                0    2     0      1    0        1        0    96
                1  100     0      0    0        0        0     0
    02:57:33  all   53     0      0    0        0        0    46
                0    7     0      1    0        0        0    92
                1  100     0      0    0        0        0     0
    02:57:34  all   54     0      1    0        0        1    44
                0    8     0      3    1        0        2    86
                1  100     0      0    0        0        0     0
    02:57:35  all   55     0      2    0        0        0    44
                0   10     0      3    0        0        0    87
                1  100     0      0    0        0        0     0
    02:57:36  all   55     0      0    0        0        0    44
                0   10     0      0    0        0        0    90
                1  100     0      0    0        0        0     0
    02:57:37  all   55     0      0    0        0        0    44
                0   10     0      2    0        0        1    87
                1  100     0      0    0        0        0     0
    02:57:38  all   54     0      2    0        0        0    44
                0    7     0      4    0        1        0    88
                1  100     0      0    0        0        0     0
    02:57:39  all   52     0      1    0        0        1    46
                0    6     0      2    0        0        2    90
                1   98     0      1    0        0        0     1
    02:57:40  all   26     0      5    0        0        0    69
                0    8     0      6    0        1        0    85
                1   44     0      3    0        0        0    53
    2:
    Code:
    61.75user 0.14system 0:33.21elapsed 186%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+255minor)pagefaults 0swaps
    
    Linux  debian  2.6.26-2-686  #1 SMP Thu Mar 26 01:08:11 UTC 2009  i686  05/28/2009
    
    02:57:43  cpu %usr %nice   %sys %irq %softirq    %wait %idle             _cpu_
    02:57:43  all   99     0      2    0        0        0     0
                0   99     0      3    0        0        0     0
                1  100     0      0    0        0        0     0
    02:57:44  all   99     0      0    0        0        0     0
                0   99     0      0    0        1        0     0
                1   99     0      1    0        0        0     0
    02:57:45  all  100     0      0    0        0        0     0
                0   99     0      1    0        0        0     0
                1  100     0      1    0        0        0     0
    02:57:46  all   99     0      0    0        0        0     0
                0   99     0      0    1        1        0     0
                1  100     0      0    0        0        0     0
    02:57:47  all  100     0      0    0        0        0     0
                0  100     0      0    0        0        0     0
                1   99     0      0    0        0        0     1
    02:57:48  all   99     0      1    0        0        0     0
                0   98     0      2    0        0        0     0
                1  100     0      1    0        0        0     0
    02:57:49  all  100     0      0    0        0        0     0
                0  100     0      0    0        0        0     0
                1  100     0      0    0        0        0     0
    02:57:50  all   99     0      1    0        0        0     0
                0   99     0      0    0        0        0     1
                1   99     0      1    0        0        0     0
    02:57:51  all   98     0      2    0        0        0     0
                0   99     0      1    0        0        0     0
                1   96     0      4    0        0        0     0
    02:57:52  all  100     0      0    0        0        0     0
                0   99     0      1    0        0        0     0
                1  100     0      0    0        0        0     0

  10. #10
    Just Joined!
    Join Date
    Jun 2008
    Posts
    34
    There is no profiling information in your outputs.
    Have you installed oprofile? Did you get any error messages when you execute .profile.sh?

Page 1 of 3 1 2 3 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •