Results 1 to 1 of 1
Hello everyone!
I am trying to run my parallel program (using mpich2) on a 9 machines, each with 2 Opteron processors. I am accessing all machines via ssh and I ...
- 10-20-2009 #1Just Joined!
- Join Date
- Oct 2009
- Posts
- 1
mpiexec (+mpdboot, mpdcheck...) problem
Hello everyone!
I am trying to run my parallel program (using mpich2) on a 9 machines, each with 2 Opteron processors. I am accessing all machines via ssh and I can 'ssh' from one machine to another without the password.
mpdboot command (as described in the documentation) produced a problem:
mpdboot_lx64a170 (handle_mpd_output 374): failed to ping mpd on lxsrv171; recvd output={}
I tried mpdcheck -l to see what would happen and it didn't produce any output (is this good or bad?)
When I 'manually' set the hosts and ports on machines lxsrv171 to lxsrv178 with:
mpd -n -h host -p port, where host and port I got via:
mpdtrace -l on the machine that I am calling mpiexec from (lxsrv170), the execution was finally possible, however, did not give expected results - it seems that most of the processes are not communicating with each other.
(I tried a simple "ring" program to make sure this is not due to my code, but it behaves exactly the same).
BTW, my machinefile looks like
lxsrv170:2
lxsrv171:2
lxsrv172:2
...
Did anyone have a similar problem?
I would be most grateful if you could help.


Reply With Quote