First have a look into the following codes.

Code:
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>

void *thread (void *arg)
{
        /*
        This function increments the value of the (long long int) variable pointed by
        arg until it reachs 100000000 and then returns.
        */

	long long int *p = (long long int *)arg;
	while (*p < 100000000) (*p)++;
	pthread_exit (NULL);
}

int main (void)
{
	pthread_t t1, t2;
	long long int x = 0, y = 0;

/* The "thread" is launched with "long long int x" as argument and wait for the                        thread to return */

	pthread_create (&t1, NULL, thread, &x);
	pthread_join (t1, NULL);

/* The "thread" is once again launched with "long long int y" as argument and wait for the thread to return */

	pthread_create (&t2, NULL, thread, &y);
	pthread_join (t2, NULL);

/* Prints the values of x and y to confirm the operation. */

	printf ("x = %lld\ty = %lld\n", x, y);

	return 0;
}
Code:
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>

void *thread (void *arg)
{
        /*
        This function increments the value of the (long long int) variable pointed by
        arg until it reachs 100000000 and then returns.
        */

	long long int *p = (long long int *)arg;
	while (*p < 100000000) (*p)++;
	pthread_exit (NULL);
}

int main (void)
{
	pthread_t t1, t2;
	long long int x = 0, y = 0;

/* 
Two "thread"s are launch with long long int x and y argument and wait for them to join
*/
 
	pthread_create (&t1, NULL, thread, &x);
	pthread_create (&t2, NULL, thread, &y);

	pthread_join (t1, NULL);
	pthread_join (t2, NULL);

/* Print the values of x and y to confirm the operation */

	printf ("x = %lld\ty = %lld\n", x, y);

	return 0;
}
These programs are executed with "/usr/bin/time" on a Compaq Presario laptop V6604 which is equipped with AMD Turion 64 X2 processor and 1GB RAM. Single thread version uses 100% of only one CPU while the dual thread version uses 100%+100% of 2 CPUs as it should. I used the "Applications->System tools->System Monitor" to view the CPU usage.

But it appeared to me that most of the time the dual thread version is running considably slower than the single thread version. To confirm this I used /usr/bin/time to record the execution time. The results are as follows:

Single thread version:
[kchat@localhost pthread_spin_lock]$ time ./test
x = 100000000 y = 100000000

real 0m1.555s
user 0m1.548s
sys 0m0.000s
[kchat@localhost pthread_spin_lock]$ time ./test
x = 100000000 y = 100000000

real 0m1.550s
user 0m1.544s
sys 0m0.000s
[kchat@localhost pthread_spin_lock]$ time ./test
x = 100000000 y = 100000000

real 0m1.525s
user 0m1.484s
sys 0m0.000s
[kchat@localhost pthread_spin_lock]$ time ./test
x = 100000000 y = 100000000

real 0m1.528s
user 0m1.480s
sys 0m0.000s
[kchat@localhost pthread_spin_lock]$


Dual thread version:
[kchat@localhost pthread_spin_lock]$ time ./test
x = 100000000 y = 100000000

real 0m4.892s
user 0m9.421s
sys 0m0.000s
[kchat@localhost pthread_spin_lock]$ time ./test
x = 100000000 y = 100000000

real 0m5.284s
user 0m10.153s
sys 0m0.004s
[kchat@localhost pthread_spin_lock]$ time ./test
x = 100000000 y = 100000000

real 0m5.319s
user 0m10.257s
sys 0m0.000s
[kchat@localhost pthread_spin_lock]$ time ./test
x = 100000000 y = 100000000

real 0m0.775s
user 0m1.520s
sys 0m0.000s
[kchat@localhost pthread_spin_lock]$ time ./test
x = 100000000 y = 100000000

real 0m5.015s
user 0m9.669s
sys 0m0.008s
[kchat@localhost pthread_spin_lock]$



The single thread version takes about 1.5 sec on average. The dual thread takes about 5 sec on average. But sometimes it takes about 0.75 sec which is the correct expected time.

I am totally confused.

Thanks
Krish