This Error Reference is intended for: Any audience.
[c19-15.local][[18870,1],5][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.254.208 failed: Connection refused (111) connect() to 192.168.255.238 failed: Connection refused (111) [c43-16.local:11900] 7 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics [c43-16.local:11900] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
This is probably due to a weakness of the system when the job is assigned to nodes with and without infiniband at the same time.
Until this is fixed on a general basis, the problem can be avoided by not trying to use the infiniband network also on the nodes where it is available, by adding the "--mca btl ^openib --mca btl_tcp_if_include eth0" option to mpirun.
mpirun --mca btl ^openib --mca btl_tcp_if_include eth0 -np 256 MyProg.exe
Another alternative is to require to avoid being assigned infiniband nodes through the PBS option ":gige", for example
You should also consider using the infiniband network, since this may significantly improve the performance of your code. If you use only infiniband nodes, you will not get the "MPI_INIT" error.
You can demand to use only infiniband nodes, with the PBS option ":ib", for example
See also Run script example for Stallo