Results 1 to 10 of 16
Hi folks!
I'm having some trouble using the POSIX threads. Here's what I do:
I have a very complicated computation problem, split it up in 4 threads (each one computing ...
- 07-16-2008 #1Just Joined!
- Join Date
- Jul 2008
- Posts
- 8
Getting started with multithreading (POSIX)
Hi folks!
I'm having some trouble using the POSIX threads. Here's what I do:
I have a very complicated computation problem, split it up in 4 threads (each one computing part of it on its own). Then I join the threads and compare their results.
I hoped to get a huge speed benefit when running this on a multicore machine, but there is none!!!
It takes about 90 seconds on a single, a dual and a quad core machine.
So I took a look at the cpu usage. it seems that the 100% usage I get on the single core are evenly distributed when there are more cores (so on the quad core each core gets about 25% usgae). That would explain why there is no speed benefit...
Here's the piece of code I use (nothing fancy, just creating and joining the threads):
Is there anything I have do to make the threads use the full cpu power? As you might have noticed I'm quite new to this kind of stuff so there might be a simple solution I overlooked so farCode:// Run threads int iret1, iret2, iret3, iret4; iret1 = pthread_create( &thread1, NULL, evalThreadLAH2, (void*) &ti1); iret2 = pthread_create( &thread2, NULL, evalThreadLAH2, (void*) &ti2); iret3 = pthread_create( &thread3, NULL, evalThreadLAH2, (void*) &ti3); iret4 = pthread_create( &thread4, NULL, evalThreadLAH2, (void*) &ti4); // Wait for threads to finish pthread_join( thread1, NULL); pthread_join( thread2, NULL); pthread_join( thread3, NULL); pthread_join( thread4, NULL);
Thanks for your help!!!
PS:
Some system information: Ubuntu 8.04, Code::Blocks IDE, C++, Intel Q6600 Processor (and some others)
- 07-18-2008 #2Just Joined!
- Join Date
- Jun 2008
- Posts
- 34
Hi,
As you described the cpu utilization for quad is 25%, I think your threads were not competing for cpu resource with threads in other processes.
If this is the case, adding the following to the beginning of your program will get you what you want:
You might want to read about thread scheduling contention scope in:Code:pthread_attr_t attr; pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
Threads
Hope this help.
-Steve
- 07-18-2008 #3Just Joined!
- Join Date
- Jun 2008
- Posts
- 34
Sorry, I missed one line in my code, it should have been:
-SteveCode:pthread_attr_t attr; pthread_attr_init(&attr); pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
- 07-18-2008 #4Just Joined!
- Join Date
- Jul 2008
- Posts
- 8
Hi Steve!
Thanks for your reply. I'll check that out as soon as I get home.
- 07-19-2008 #5Just Joined!
- Join Date
- Jun 2008
- Posts
- 34
Hi, sorry forgot to tell u you to specify the attribute in each of your pthread_create's.
e.g.
iret1 = pthread_create( &thread1, NULL, evalThreadLAH2, (void*) &ti1);
shall be replaced with
-SteveCode:iret1 = pthread_create( &thread1, &attr, evalThreadLAH2, (void*) &ti1);
- 07-20-2008 #6Just Joined!
- Join Date
- Jul 2008
- Posts
- 8
Hi Steve,
I tried your suggestion but nothing changed. Computation still takes damn old 66 seconds, both cores are only used to a maximum of 60%. So there's still some CPU-wasting going on
Did I miss something? Here's the new code (just like you suggested):
Code:// Setup thread attributes pthread_attr_t attr; pthread_attr_init(&attr); pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM); // Run threads int iret1, iret2, iret3, iret4; iret1 = pthread_create( &thread1, &attr, evalThreadLAH2, (void*) &ti1); iret2 = pthread_create( &thread2, &attr, evalThreadLAH2, (void*) &ti2); iret3 = pthread_create( &thread3, &attr, evalThreadLAH2, (void*) &ti3); iret4 = pthread_create( &thread4, &attr, evalThreadLAH2, (void*) &ti4); // Wait for threads to finish pthread_join( thread1, NULL); pthread_join( thread2, NULL); pthread_join( thread3, NULL); pthread_join( thread4, NULL);
- 07-21-2008 #7Just Joined!
- Join Date
- Jun 2008
- Posts
- 34
Hi,
While your program is running on your 2cpu machine, could you do "sar -P ALL 1 5" and post the result here?
To help isolate the problem, appreciate if you could run the following simplied version of your program (with a simple thread routine) evalThreadLAH2:
Please do the following:Code://thread.c #include <stdlib.h> #include <stdio.h> #include <pthread.h> #define MAXTHREAD 4 void *evalThreadLAH2(void *array) { int i, j; for (j=0;j<1000000000;j++) {i=j;} pthread_exit(NULL);} int main(int argc, char **argv) { pthread_t thread[MAXTHREAD]; int ti[MAXTHREAD]; int iret, m, numothread ; if (argc==2) { numothread=atoi(argv[1]); if (numothread>MAXTHREAD) numothread=MAXTHREAD; } else { printf ("Usage: thread n (n=1,2,3 or 4)\n");} pthread_attr_t attr; pthread_attr_init(&attr); pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM); for (m=0;m<numothread;m++) { iret=pthread_create(&thread[m], &attr, evalThreadLAH2, (void *)&ti[m]); } for (m=0;m<numothread;m++) { pthread_join(thread[m], NULL); } }
> cc -o thread -lpthread thread.c
> time ./thread 1
> time ./thread 2
I expect the above time(user) to be roughly the same for both the single and duo thread case on your 2cpu machine.
If they are, we need to know more about your thread routine evalThreadLAH2.
If they are not, use sar -P ALL 1 100 to monitor the cpu utilization when (./thread 2) is executiing and post the appropriate section of sar output here.
Thanks.
-Steve
- 07-21-2008 #8Just Joined!
- Join Date
- Jul 2008
- Posts
- 8
Hi Steve!
First of all I have to thank you. I've never seen anyone in a forum so helpful
Here are the results of your code on the dual core:
And the sar output:Code:time ./thread 1 gives: real 0m2.701s user 0m2.696s sys 0m0.004s time ./thread 2 gives: real 0m3.575s user 0m5.636s sys 0m0.028s
Code:06:48:06 AM CPU %user %nice %system %iowait %steal %idle 06:48:07 AM all 2.96 0.00 0.99 0.00 0.00 96.06 06:48:07 AM 0 2.04 0.00 0.00 0.00 0.00 97.96 06:48:07 AM 1 3.88 0.00 0.97 0.00 0.00 95.15 06:48:07 AM 2 0.00 0.00 0.00 0.00 0.00 0.00 06:48:07 AM CPU %user %nice %system %iowait %steal %idle 06:48:08 AM all 4.88 0.00 0.98 0.00 0.00 94.15 06:48:08 AM 0 7.84 0.00 0.98 0.00 0.00 91.18 06:48:08 AM 1 1.92 0.00 0.96 0.00 0.00 97.12 06:48:08 AM 2 0.00 0.00 0.00 0.00 0.00 0.00 06:48:08 AM CPU %user %nice %system %iowait %steal %idle 06:48:09 AM all 5.94 0.00 0.50 0.00 0.00 93.56 06:48:09 AM 0 11.00 0.00 1.00 0.00 0.00 88.00 06:48:09 AM 1 1.94 0.00 0.00 0.00 0.00 98.06 06:48:09 AM 2 0.00 0.00 0.00 0.00 0.00 0.00 06:48:09 AM CPU %user %nice %system %iowait %steal %idle 06:48:10 AM all 5.00 0.00 1.00 0.00 0.00 94.00 06:48:10 AM 0 7.45 0.00 0.00 0.00 0.00 92.55 06:48:10 AM 1 1.92 0.00 0.96 0.00 0.00 97.12 06:48:10 AM 2 0.00 0.00 0.00 0.00 0.00 0.00 06:48:10 AM CPU %user %nice %system %iowait %steal %idle 06:48:11 AM all 5.77 0.00 0.48 0.00 0.00 93.75 06:48:11 AM 0 10.48 0.00 0.00 0.00 0.00 89.52 06:48:11 AM 1 1.92 0.00 0.96 0.00 0.00 97.12 06:48:11 AM 2 0.00 0.00 0.00 0.00 0.00 0.00 Average: CPU %user %nice %system %iowait %steal %idle Average: all 4.91 0.00 0.79 0.00 0.00 94.30 Average: 0 7.82 0.00 0.40 0.00 0.00 91.78 Average: 1 2.32 0.00 0.77 0.00 0.00 96.91 Average: 2 0.00 0.00 0.00 0.00 0.00 0.00
I also did both on the single core (just in case):
And the sar output:Code:time ./thread 1 gives: real 0m3.684s user 0m3.572s sys 0m0.004s time ./thread 1 gives: real 0m7.145s user 0m7.024s sys 0m0.024s
Code:12:30:03 PM CPU %user %nice %system %iowait %steal %idle 12:30:04 PM all 2.00 0.00 4.00 0.00 0.00 94.00 12:30:04 PM 0 2.00 0.00 4.00 0.00 0.00 94.00 12:30:04 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 12:30:04 PM CPU %user %nice %system %iowait %steal %idle 12:30:05 PM all 3.00 0.00 12.00 0.00 0.00 85.00 12:30:05 PM 0 3.00 0.00 12.00 0.00 0.00 85.00 12:30:05 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 12:30:05 PM CPU %user %nice %system %iowait %steal %idle 12:30:06 PM all 8.00 0.00 17.00 0.00 0.00 75.00 12:30:06 PM 0 8.00 0.00 17.00 0.00 0.00 75.00 12:30:06 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 12:30:06 PM CPU %user %nice %system %iowait %steal %idle 12:30:07 PM all 9.00 0.00 28.00 0.00 0.00 63.00 12:30:07 PM 0 9.00 0.00 28.00 0.00 0.00 63.00 12:30:07 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 12:30:07 PM CPU %user %nice %system %iowait %steal %idle 12:30:08 PM all 22.77 0.00 35.64 0.00 0.00 41.58 12:30:08 PM 0 22.77 0.00 35.64 0.00 0.00 41.58 12:30:08 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 Average: CPU %user %nice %system %iowait %steal %idle Average: all 8.98 0.00 19.36 0.00 0.00 71.66 Average: 0 8.98 0.00 19.36 0.00 0.00 71.66 Average: 1 0.00 0.00 0.00 0.00 0.00 0.00
So what's next? You said if times weren't the same I should try the "sar -P ALL 1 100" thingy?
I'll do that then next...
Thanks a lot (again) mate!!!!
- 07-21-2008 #9Just Joined!
- Join Date
- Jun 2008
- Posts
- 34
Hi,
Thanks for running the test.
The sar numbers do not look right. Take the single core(1 CPU) result, it takes 3.5 s for 1 thread to run. As the thread routine is a tight loop, I would expect the %user cpu utilization to be close to 100% for about 3s(This is what I c with my system which is a single Intel Celeron CPU), not just a few %. Could u describe how did u collect the sar output? Just want to make sure we are looking at the correct interval. thanks.
-Steve
- 07-21-2008 #10Just Joined!
- Join Date
- Jul 2008
- Posts
- 8
Ah, now I get what this sar command is all about. So I have to admit that the values posted above are useless (because they were taken right AFTER the other app finished).
Now I did it right and got the following results:
Your thread app made both cores go up to 100%. Then I tested mine again: here sar says what I saw before on the system monitor (not more than 50%).
So I guess there's nothing wrong with POSIX then
Somehow my threads must be designed the wrong way....


Reply With Quote