Results 1 to 10 of 21
so i have been testing things out with programming on an smp (since i just got a new dual core system so i thought id check it out) so i ...
- 05-22-2009 #1Just Joined!
- Join Date
- Dec 2006
- Posts
- 85
need help with pthread programming on an smp
so i have been testing things out with programming on an smp (since i just got a new dual core system so i thought id check it out) so i made a test program to see if i will get a performance increase from using threads on my smp and i came to a most curious result, this is the code:
as you can see this calculates the sum of the squares of all numbers from 1 to 1,000,000,000 (got the idea from the beowulf howto, thought it should apply to the smp too) now i ran this doing it with 1 thread and i got this:Code:#include <iostream> #include <iomanip> #include <stdio.h> #include <fstream> #include <math.h> #include <pthread.h> #define printf(...) fprintf (stdout, __VA_ARGS__); fflush(stdout); using namespace std; struct hello_world { double first; double second; double result; }; void *myfunc(void *value) { hello_world *myhello = (hello_world*)value; for (int counter = myhello->first; counter <= myhello->second; counter++) { myhello->result += sqrt((double)counter); } } int main() { pthread_t thread1, thread2; hello_world *myhello; hello_world *nexthello; myhello = new hello_world; nexthello = new hello_world; myhello->first = 1.0; myhello->second = 500000000.0; nexthello->first = 500000001.0; nexthello->second = 1000000000.0; printf("set second, launching thread\n"); pthread_create( &thread1, NULL, myfunc, (void*)myhello); pthread_create( &thread2, NULL, myfunc, (void*)nexthello); printf("waiting\n"); pthread_join(thread1,NULL); pthread_join(thread2,NULL); printf("%lf",(myhello->result+ nexthello->result)); printf("done\n"); }
$ time ./a.out
myhello made
set first
set second, launching thread
waiting
done
real 0m18.808s
user 0m18.013s
sys 0m0.028s
then i ran it with 2 threads, i expected to see around 9-13 or so but i got this:
$ time ./a.out
set second, launching thread
waiting
1052463001done
real 0m32.932s
user 0m58.192s
sys 0m0.112s
(the first one didnt output because i didnt really care what the result was..)
this is nearly twice the amount of time to do the SAME calculation.
i dont understand this. am i doing something wrong here? i expected it would be around half the time or at least equal to the time of a single thread...so why does running 2 threads take TWICE as long? do i have to program it to explicitly make use of both processors and even if its only running on 1 processor...why would it take twice as long?
- 05-22-2009 #2Just Joined!
- Join Date
- Jun 2008
- Posts
- 34
If you want each thread to use a different cpu at the same time, you will need to add the following at the beginning of your main():
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
and modify your pthread_create() calls to
pthread_create( &thread1, &attr, myfunc, (void*)myhello);
pthread_create( &thread2, &attr, myfunc, (void*)nexthello);
You may check whether only a single CPU is used when your program is running by executing the command "sar -P ALL 1 10" (on a separate window while your program is running).
Hope this helps.
-Steve
- 05-22-2009 #3Just Joined!
- Join Date
- Dec 2006
- Posts
- 85
well i had since added this to the function:
each thread has a different number for affinity (0 and 1) and when i check my ksysguard while its running i see that the user% goes from ~8-10 for each core to 98-100% for each core...which certainly indicates that it is running on both cores.Code:void *myfunc(void *value) { cpu_set_t mask; hello_world *myhello = (hello_world*)value; CPU_ZERO(&mask); CPU_SET(myhello->affinity,&mask); sched_setaffinity(pthread_self(), sizeof(mask), &mask); printf("first is: %lf, second is %lf, checking if affinity is correct, affinity should be: %d is it: %d\n", myhello->first, myhello->second, myhello->affinity, CPU_ISSET(myhello->affinity, &mask)); for (double counter = myhello->first; counter <= myhello->second; counter++) { myhello->result += sqrt((double)counter); } printf("done with %d\n", myhello->affinity); }
though supposing it is only running on one core why would using 2 threads run twice as long as running 1 thread?
i will give that a try but im still wondering why this would be the case..
--edit--
ok i tested what you said...no change...if anything it took a few seconds longer...
--edit2--
it seems even without explicity setting affinity the kernel automatically load balances to use both cpus.
--edit3--
i tested and confirmed that using only 1 thread DOES use only one core...so when using two i AM using 2 cores. (fork() doesnt help either...so its not a memory issue...)
--edit--
to further confound the issue, a recent test has given me even more wierd results...i split the program into 2 programs, each program doing 1/2 the work (basically a program doing what the threads already do) and then i ran it with this script:
and ranCode:#!/bin/bash #testing.sh ./smp1& ./smp2; echo "done";
and got this:Code:time ./testing.sh
just what i expected to see from my multi-threaded program.Code:real 0m8.241s user 0m8.049s sys 0m0.012s
when i run the multi-threaded program but make sure that all threads run on 1 cpu i get 16 seconds, the same as a single threaded program, but when running on both cpus i get 34...
what the hell is going on?
- 05-23-2009 #4Just Joined!
- Join Date
- Jun 2008
- Posts
- 34
From your description, my guess was that your program was using only one CPU at a time (while it might be switching from one CPU to another during the course of your program execution).
So if you use "sar -P ALL 1 100" to monitor your program execution, you might see something like(assuming you have only 2 CPUs):
06:48:06 AM CPU %user
06:48:07 AM all 50 (<- average of 99 and 0)
06:48:07 AM 0 99
06:48:07 AM 1 0
or
06:48:06 AM CPU %user
06:48:07 AM all 50
06:48:07 AM 0 0
06:48:07 AM 1 99
but never:
06:48:06 AM CPU %user
06:48:07 AM all 99
06:48:07 AM 0 98
06:48:07 AM 1 99
This is only my guess. We need to verify whether this is case. Appreciate if you could post the "sar" output . Thanks.
-Steve
- 05-23-2009 #5Just Joined!
- Join Date
- Dec 2006
- Posts
- 85
yeah....i didnt get that...the command you gave gave me a syntax error (the -P wouldnt accept ALL so i used -A)
i get this:
21:38:04 cpu %usr %nice %sys %irq %softirq %wait %idle _cpu_
21:38:05 all 100 0 0 0 0 0 0
0 100 0 0 0 0 0 0
1 100 0 0 0 0 0 0
--edit--
also, i tested that the cpu affinity works, i set both threads with an affinity for cpu 0, and they only ran on 0, this is running with one having an affinity for ONLY 0 and the other having affinity for ONLY 1. so it shouldnt be able to switch around them.
each certainly seems to be working on sepperate cpus...
- 05-24-2009 #6Just Joined!
- Join Date
- Dec 2006
- Posts
- 85
that is the sar output there...anything useful?
- 05-28-2009 #7Just Joined!
- Join Date
- Jun 2008
- Posts
- 34
Assuming we are seeing 100% utilization of both cpu's through out your multi-thread program execution, I would suggest to use "oprofile" to investigate further.
If you haven't installed "oprofile" already, you may download the appropriate .rpm from internet for your Linux distribution and install.
I am using openSuSe and I have "oprofile-0.9.4-50.11" (checked with "rpm -qa | grep oprofile") installed.
I have prepared a test script using your program. Please run it after you have installed oprofile.
Here's the script "profile.sh":
Please perform the following:NUMOTHREAD=2
[ $# -eq 1 ] && [ $1 -eq 1 ] && NUMOTHREAD=1
TESTDIR=/tmp
PROG=forum
rm -f $TESTDIR/$PROG
# create test program source file
cat > $TESTDIR/$PROG.c <<EOF
#include <stdio.h>
#include <math.h>
#include <pthread.h>
using namespace std;
struct hello_world
{
double first;
double second;
double result;
};
void *myfunc(void *value)
{
hello_world *myhello = (hello_world*)value;
for (int counter = myhello->first; counter <= myhello->second; counter++)
{
myhello->result += sqrt((double)counter);
}
}
int main()
{
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
pthread_t thread1, thread2;
hello_world *myhello;
hello_world *nexthello;
myhello = new hello_world;
nexthello = new hello_world;
myhello->first = 1.0;
myhello->second = 500000000.0;
nexthello->first = 500000001.0;
nexthello->second = 1000000000.0;
EOF
if [ $NUMOTHREAD -eq 1 ]
then
cat >> $TESTDIR/$PROG.c <<EOF1
pthread_create( &thread1, &attr, myfunc, (void*)myhello);
pthread_join(thread1,NULL);
}
EOF1
else
cat >> $TESTDIR/$PROG.c <<EOF2
pthread_create( &thread1, &attr, myfunc, (void*)myhello);
pthread_create( &thread2, &attr, myfunc, (void*)nexthello);
pthread_join(thread1,NULL);
pthread_join(thread2,NULL);
}
EOF2
fi
# Compile
g++ -g -lpthread -o $TESTDIR/forum $TESTDIR/$PROG.c
# Profile setup and start
/usr/bin/opcontrol --reset
#/usr/bin/opcontrol --setup --no-vmlinux --separate=library --event=GLOBAL_POWER_EVENTS:750000:0x1:1:1 --separate=thread
/usr/bin/opcontrol --setup --separate=library,thread --event=GLOBAL_POWER_EVENTS:750000:0x1:1:1
/usr/bin/opcontrol --start
sar 1 10 > $TESTDIR/sar$NUMOTHREAD.out &
# execute
/usr/bin/time -o $TESTDIR/profile$NUMOTHREAD.out $TESTDIR/$PROG
/usr/bin/opcontrol --shutdown
/usr/bin/opreport --long-filenames image:$TESTDIR/$PROG --threshold 1 >> $TESTDIR/profile$NUMOTHREAD.out
opannotate image:$TESTDIR/$PROG --source >> $TESTDIR/profile$NUMOTHREAD.out
cat $TESTDIR/sar$NUMOTHREAD.out >> $TESTDIR/profile$NUMOTHREAD.out
1. ./profile.sh 1
2. ./profile.sh 2
3. post /tmp/profile1.out and /tmp/profile2.out in your reply
Hopefully this might give us some hints.
-Steve
- 05-28-2009 #8Just Joined!
- Join Date
- Jun 2008
- Posts
- 34
fyi, here's my profile2.out when I execute "./profile.sh 2" on my single cpu system:
38.73user 0.04system 0:38.90elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+309minor)pagefaults 0swaps
CPU: P4 / Xeon, speed 3065.79 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 750000
Processes with a thread ID of 5503
Processes with a thread ID of 5504
Processes with a thread ID of 5505
tid:5503| tid:5504| tid:5505|
samples| %| samples| %| samples| %|
------------------------------------------------------
6 100.000 76749 100.000 76802 100.000 /tmp/forum
tid:5503| tid:5504| tid:5505|
samples| %| samples| %| samples| %|
------------------------------------------------------
6 100.000 0 0 0 0 /lib64/ld-2.8.so
0 0 76749 100.000 76802 100.000 /tmp/forum
/*
* Command line: opannotate image:/tmp/forum --source
*
* Interpretation of command line:
* Output annotated source file with samples
* Output all files
*
* CPU: P4 / Xeon, speed 3065.79 MHz (estimated)
* Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 750000
* Processes with a thread ID of 5503
* Processes with a thread ID of 5504
* Processes with a thread ID of 5505
*/
/*
* Total samples for file : "/tmp/forum.c"
*
* 0 0 76749 100.000 76802 100.000
*/
:#include <stdio.h>
:#include <math.h>
:#include <pthread.h>
:using namespace std;
:
:
:struct hello_world
:{
:double first;
:double second;
:double result;
:};
:
:void *myfunc(void *value) /* myfunc(void*) total: 0 0 76749 100.000 76802 100.000 */
:{
:hello_world *myhello = (hello_world*)value;
0 0 10522 13.7096 10124 13.1819 :for (int counter = myhello->first; counter <= myhello->second; counter++)
: {
0 0 66227 86.2904 66678 86.8181 : myhello->result += sqrt((double)counter);
: }
:}
:
:int main()
:{
: pthread_attr_t attr;
: pthread_attr_init(&attr);
: pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
: pthread_t thread1, thread2;
: hello_world *myhello;
: hello_world *nexthello;
: myhello = new hello_world;
: nexthello = new hello_world;
: myhello->first = 1.0;
: myhello->second = 500000000.0;
: nexthello->first = 500000001.0;
: nexthello->second = 1000000000.0;
: pthread_create( &thread1, &attr, myfunc, (void*)myhello);
: pthread_create( &thread2, &attr, myfunc, (void*)nexthello);
: pthread_join(thread1,NULL);
: pthread_join(thread2,NULL);
:}
Linux 2.6.25.5-1.1-default (linux-2biz) 05/28/09 _x86_64_
09:11:54 CPU %user %nice %system %iowait %steal %idle
09:11:55 all 99.00 0.00 1.00 0.00 0.00 0.00
09:11:56 all 100.00 0.00 0.00 0.00 0.00 0.00
09:11:57 all 100.00 0.00 0.00 0.00 0.00 0.00
09:11:58 all 99.01 0.00 0.99 0.00 0.00 0.00
09:11:59 all 100.00 0.00 0.00 0.00 0.00 0.00
09:12:00 all 99.00 0.00 1.00 0.00 0.00 0.00
09:12:01 all 99.01 0.00 0.99 0.00 0.00 0.00
09:12:02 all 100.00 0.00 0.00 0.00 0.00 0.00
09:12:03 all 100.00 0.00 0.00 0.00 0.00 0.00
09:12:04 all 99.01 0.00 0.99 0.00 0.00 0.00
Average: all 99.50 0.00 0.50 0.00 0.00 0.00
- 05-28-2009 #9Just Joined!
- Join Date
- Dec 2006
- Posts
- 85
1:
2:Code:9.04user 0.00system 0:09.10elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+255minor)pagefaults 0swaps Linux debian 2.6.26-2-686 #1 SMP Thu Mar 26 01:08:11 UTC 2009 i686 05/28/2009 02:57:31 cpu %usr %nice %sys %irq %softirq %wait %idle _cpu_ 02:57:31 all 62 0 1 0 0 1 36 0 22 0 3 0 0 1 74 1 100 0 0 0 0 0 0 02:57:32 all 51 0 0 0 0 0 48 0 2 0 1 0 1 0 96 1 100 0 0 0 0 0 0 02:57:33 all 53 0 0 0 0 0 46 0 7 0 1 0 0 0 92 1 100 0 0 0 0 0 0 02:57:34 all 54 0 1 0 0 1 44 0 8 0 3 1 0 2 86 1 100 0 0 0 0 0 0 02:57:35 all 55 0 2 0 0 0 44 0 10 0 3 0 0 0 87 1 100 0 0 0 0 0 0 02:57:36 all 55 0 0 0 0 0 44 0 10 0 0 0 0 0 90 1 100 0 0 0 0 0 0 02:57:37 all 55 0 0 0 0 0 44 0 10 0 2 0 0 1 87 1 100 0 0 0 0 0 0 02:57:38 all 54 0 2 0 0 0 44 0 7 0 4 0 1 0 88 1 100 0 0 0 0 0 0 02:57:39 all 52 0 1 0 0 1 46 0 6 0 2 0 0 2 90 1 98 0 1 0 0 0 1 02:57:40 all 26 0 5 0 0 0 69 0 8 0 6 0 1 0 85 1 44 0 3 0 0 0 53
Code:61.75user 0.14system 0:33.21elapsed 186%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+255minor)pagefaults 0swaps Linux debian 2.6.26-2-686 #1 SMP Thu Mar 26 01:08:11 UTC 2009 i686 05/28/2009 02:57:43 cpu %usr %nice %sys %irq %softirq %wait %idle _cpu_ 02:57:43 all 99 0 2 0 0 0 0 0 99 0 3 0 0 0 0 1 100 0 0 0 0 0 0 02:57:44 all 99 0 0 0 0 0 0 0 99 0 0 0 1 0 0 1 99 0 1 0 0 0 0 02:57:45 all 100 0 0 0 0 0 0 0 99 0 1 0 0 0 0 1 100 0 1 0 0 0 0 02:57:46 all 99 0 0 0 0 0 0 0 99 0 0 1 1 0 0 1 100 0 0 0 0 0 0 02:57:47 all 100 0 0 0 0 0 0 0 100 0 0 0 0 0 0 1 99 0 0 0 0 0 1 02:57:48 all 99 0 1 0 0 0 0 0 98 0 2 0 0 0 0 1 100 0 1 0 0 0 0 02:57:49 all 100 0 0 0 0 0 0 0 100 0 0 0 0 0 0 1 100 0 0 0 0 0 0 02:57:50 all 99 0 1 0 0 0 0 0 99 0 0 0 0 0 1 1 99 0 1 0 0 0 0 02:57:51 all 98 0 2 0 0 0 0 0 99 0 1 0 0 0 0 1 96 0 4 0 0 0 0 02:57:52 all 100 0 0 0 0 0 0 0 99 0 1 0 0 0 0 1 100 0 0 0 0 0 0
- 05-28-2009 #10Just Joined!
- Join Date
- Jun 2008
- Posts
- 34
There is no profiling information in your outputs.
Have you installed oprofile? Did you get any error messages when you execute .profile.sh?


Reply With Quote
