Find the answer to your Linux question:
Page 1 of 2 1 2 LastLast
Results 1 to 10 of 16
Hi folks! I'm having some trouble using the POSIX threads. Here's what I do: I have a very complicated computation problem, split it up in 4 threads (each one computing ...
  1. #1
    Just Joined!
    Join Date
    Jul 2008
    Posts
    8

    Unhappy Getting started with multithreading (POSIX)

    Hi folks!
    I'm having some trouble using the POSIX threads. Here's what I do:
    I have a very complicated computation problem, split it up in 4 threads (each one computing part of it on its own). Then I join the threads and compare their results.

    I hoped to get a huge speed benefit when running this on a multicore machine, but there is none!!!
    It takes about 90 seconds on a single, a dual and a quad core machine.
    So I took a look at the cpu usage. it seems that the 100% usage I get on the single core are evenly distributed when there are more cores (so on the quad core each core gets about 25% usgae). That would explain why there is no speed benefit...

    Here's the piece of code I use (nothing fancy, just creating and joining the threads):
    Code:
    // Run threads
        int  iret1, iret2, iret3, iret4;
        iret1 = pthread_create( &thread1, NULL, evalThreadLAH2, (void*) &ti1);
        iret2 = pthread_create( &thread2, NULL, evalThreadLAH2, (void*) &ti2);
        iret3 = pthread_create( &thread3, NULL, evalThreadLAH2, (void*) &ti3);
        iret4 = pthread_create( &thread4, NULL, evalThreadLAH2, (void*) &ti4);
    
        // Wait for threads to finish
        pthread_join( thread1, NULL);
        pthread_join( thread2, NULL);
        pthread_join( thread3, NULL);
        pthread_join( thread4, NULL);
    Is there anything I have do to make the threads use the full cpu power? As you might have noticed I'm quite new to this kind of stuff so there might be a simple solution I overlooked so far

    Thanks for your help!!!

    PS:
    Some system information: Ubuntu 8.04, Code::Blocks IDE, C++, Intel Q6600 Processor (and some others)

  2. #2
    Just Joined!
    Join Date
    Jun 2008
    Posts
    34
    Hi,
    As you described the cpu utilization for quad is 25%, I think your threads were not competing for cpu resource with threads in other processes.

    If this is the case, adding the following to the beginning of your program will get you what you want:
    Code:
      pthread_attr_t attr; 
      pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
    You might want to read about thread scheduling contention scope in:
    Threads

    Hope this help.

    -Steve

  3. #3
    Just Joined!
    Join Date
    Jun 2008
    Posts
    34
    Sorry, I missed one line in my code, it should have been:

    Code:
    pthread_attr_t attr; 
    pthread_attr_init(&attr);
    pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
    -Steve

  4. #4
    Just Joined!
    Join Date
    Jul 2008
    Posts
    8
    Hi Steve!
    Thanks for your reply. I'll check that out as soon as I get home.

  5. #5
    Just Joined!
    Join Date
    Jun 2008
    Posts
    34
    Hi, sorry forgot to tell u you to specify the attribute in each of your pthread_create's.
    e.g.
    iret1 = pthread_create( &thread1, NULL, evalThreadLAH2, (void*) &ti1);
    shall be replaced with
    Code:
    iret1 = pthread_create( &thread1, &attr, evalThreadLAH2, (void*) &ti1);
    -Steve

  6. #6
    Just Joined!
    Join Date
    Jul 2008
    Posts
    8
    Hi Steve,
    I tried your suggestion but nothing changed. Computation still takes damn old 66 seconds, both cores are only used to a maximum of 60%. So there's still some CPU-wasting going on

    Did I miss something? Here's the new code (just like you suggested):

    Code:
    // Setup thread attributes
        pthread_attr_t attr;
        pthread_attr_init(&attr);
        pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
    
        // Run threads
        int  iret1, iret2, iret3, iret4;
        iret1 = pthread_create( &thread1, &attr, evalThreadLAH2, (void*) &ti1);
        iret2 = pthread_create( &thread2, &attr, evalThreadLAH2, (void*) &ti2);
        iret3 = pthread_create( &thread3, &attr, evalThreadLAH2, (void*) &ti3);
        iret4 = pthread_create( &thread4, &attr, evalThreadLAH2, (void*) &ti4);
    
        // Wait for threads to finish
        pthread_join( thread1, NULL);
        pthread_join( thread2, NULL);
        pthread_join( thread3, NULL);
        pthread_join( thread4, NULL);

  7. #7
    Just Joined!
    Join Date
    Jun 2008
    Posts
    34
    Hi,

    While your program is running on your 2cpu machine, could you do "sar -P ALL 1 5" and post the result here?

    To help isolate the problem, appreciate if you could run the following simplied version of your program (with a simple thread routine) evalThreadLAH2:

    Code:
    //thread.c
    #include <stdlib.h>
    #include <stdio.h>
    #include <pthread.h>
    #define MAXTHREAD 4
    
    void *evalThreadLAH2(void *array)
    { int i, j;
      for (j=0;j<1000000000;j++) {i=j;}
           pthread_exit(NULL);}
    
    int main(int argc, char **argv)
    {
      pthread_t thread[MAXTHREAD];
      int ti[MAXTHREAD];
      int iret, m, numothread ;
    
      if (argc==2) { numothread=atoi(argv[1]);
                     if (numothread>MAXTHREAD) numothread=MAXTHREAD; }
      else { printf ("Usage: thread n (n=1,2,3 or 4)\n");}
    
      pthread_attr_t attr;
      pthread_attr_init(&attr);
      pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
    
      for (m=0;m<numothread;m++)
          { iret=pthread_create(&thread[m], &attr, evalThreadLAH2, (void *)&ti[m]); }
    
      for (m=0;m<numothread;m++)
          { pthread_join(thread[m], NULL); }
    
    }
    Please do the following:
    > cc -o thread -lpthread thread.c
    > time ./thread 1
    > time ./thread 2
    I expect the above time(user) to be roughly the same for both the single and duo thread case on your 2cpu machine.

    If they are, we need to know more about your thread routine evalThreadLAH2.
    If they are not, use sar -P ALL 1 100 to monitor the cpu utilization when (./thread 2) is executiing and post the appropriate section of sar output here.
    Thanks.

    -Steve

  8. #8
    Just Joined!
    Join Date
    Jul 2008
    Posts
    8
    Hi Steve!
    First of all I have to thank you. I've never seen anyone in a forum so helpful


    Here are the results of your code on the dual core:

    Code:
    time ./thread 1 gives:
    
    real	0m2.701s
    user	0m2.696s
    sys	0m0.004s
    
    
    time ./thread 2 gives:
    
    real	0m3.575s
    user	0m5.636s
    sys	0m0.028s
    And the sar output:
    Code:
    06:48:06 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
    06:48:07 AM     all      2.96      0.00      0.99      0.00      0.00     96.06
    06:48:07 AM       0      2.04      0.00      0.00      0.00      0.00     97.96
    06:48:07 AM       1      3.88      0.00      0.97      0.00      0.00     95.15
    06:48:07 AM       2      0.00      0.00      0.00      0.00      0.00      0.00
    
    06:48:07 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
    06:48:08 AM     all      4.88      0.00      0.98      0.00      0.00     94.15
    06:48:08 AM       0      7.84      0.00      0.98      0.00      0.00     91.18
    06:48:08 AM       1      1.92      0.00      0.96      0.00      0.00     97.12
    06:48:08 AM       2      0.00      0.00      0.00      0.00      0.00      0.00
    
    06:48:08 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
    06:48:09 AM     all      5.94      0.00      0.50      0.00      0.00     93.56
    06:48:09 AM       0     11.00      0.00      1.00      0.00      0.00     88.00
    06:48:09 AM       1      1.94      0.00      0.00      0.00      0.00     98.06
    06:48:09 AM       2      0.00      0.00      0.00      0.00      0.00      0.00
    
    06:48:09 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
    06:48:10 AM     all      5.00      0.00      1.00      0.00      0.00     94.00
    06:48:10 AM       0      7.45      0.00      0.00      0.00      0.00     92.55
    06:48:10 AM       1      1.92      0.00      0.96      0.00      0.00     97.12
    06:48:10 AM       2      0.00      0.00      0.00      0.00      0.00      0.00
    
    06:48:10 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
    06:48:11 AM     all      5.77      0.00      0.48      0.00      0.00     93.75
    06:48:11 AM       0     10.48      0.00      0.00      0.00      0.00     89.52
    06:48:11 AM       1      1.92      0.00      0.96      0.00      0.00     97.12
    06:48:11 AM       2      0.00      0.00      0.00      0.00      0.00      0.00
    
    Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
    Average:        all      4.91      0.00      0.79      0.00      0.00     94.30
    Average:          0      7.82      0.00      0.40      0.00      0.00     91.78
    Average:          1      2.32      0.00      0.77      0.00      0.00     96.91
    Average:          2      0.00      0.00      0.00      0.00      0.00      0.00

    I also did both on the single core (just in case):

    Code:
    time ./thread 1 gives:
    
    real	0m3.684s
    user	0m3.572s
    sys	0m0.004s
    
    
    time ./thread 1 gives:
    
    real	0m7.145s
    user	0m7.024s
    sys	0m0.024s
    And the sar output:
    Code:
    12:30:03 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
    12:30:04 PM     all      2.00      0.00      4.00      0.00      0.00     94.00
    12:30:04 PM       0      2.00      0.00      4.00      0.00      0.00     94.00
    12:30:04 PM       1      0.00      0.00      0.00      0.00      0.00      0.00
    
    12:30:04 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
    12:30:05 PM     all      3.00      0.00     12.00      0.00      0.00     85.00
    12:30:05 PM       0      3.00      0.00     12.00      0.00      0.00     85.00
    12:30:05 PM       1      0.00      0.00      0.00      0.00      0.00      0.00
    
    12:30:05 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
    12:30:06 PM     all      8.00      0.00     17.00      0.00      0.00     75.00
    12:30:06 PM       0      8.00      0.00     17.00      0.00      0.00     75.00
    12:30:06 PM       1      0.00      0.00      0.00      0.00      0.00      0.00
    
    12:30:06 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
    12:30:07 PM     all      9.00      0.00     28.00      0.00      0.00     63.00
    12:30:07 PM       0      9.00      0.00     28.00      0.00      0.00     63.00
    12:30:07 PM       1      0.00      0.00      0.00      0.00      0.00      0.00
    
    12:30:07 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
    12:30:08 PM     all     22.77      0.00     35.64      0.00      0.00     41.58
    12:30:08 PM       0     22.77      0.00     35.64      0.00      0.00     41.58
    12:30:08 PM       1      0.00      0.00      0.00      0.00      0.00      0.00
    
    Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
    Average:        all      8.98      0.00     19.36      0.00      0.00     71.66
    Average:          0      8.98      0.00     19.36      0.00      0.00     71.66
    Average:          1      0.00      0.00      0.00      0.00      0.00      0.00

    So what's next? You said if times weren't the same I should try the "sar -P ALL 1 100" thingy?
    I'll do that then next...


    Thanks a lot (again) mate!!!!

  9. #9
    Just Joined!
    Join Date
    Jun 2008
    Posts
    34
    Hi,
    Thanks for running the test.
    The sar numbers do not look right. Take the single core(1 CPU) result, it takes 3.5 s for 1 thread to run. As the thread routine is a tight loop, I would expect the %user cpu utilization to be close to 100% for about 3s(This is what I c with my system which is a single Intel Celeron CPU), not just a few %. Could u describe how did u collect the sar output? Just want to make sure we are looking at the correct interval. thanks.

    -Steve

  10. #10
    Just Joined!
    Join Date
    Jul 2008
    Posts
    8
    Ah, now I get what this sar command is all about. So I have to admit that the values posted above are useless (because they were taken right AFTER the other app finished).

    Now I did it right and got the following results:
    Your thread app made both cores go up to 100%. Then I tested mine again: here sar says what I saw before on the system monitor (not more than 50%).

    So I guess there's nothing wrong with POSIX then
    Somehow my threads must be designed the wrong way....

Page 1 of 2 1 2 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...