CLIC usage

- experience of users for users -

(Last modified: )

CLIC

This is a collection of topics I have experienced during the first weeks and months of testing on CLIC.
It's something like FAQ, but there is no guarantee for correctness in each point.

If anyone has other or additional experience of interest for other users - please let me know (m.pester@mathematik.tu-chemnitz.de)

Note:
There may be changes (version numbers, paths, ...) by upgrading the system. Thus, parts of this page may become obsolete some day.
The author is not responsible for the contents of other pages that are linked here.

What do I need to run a program on CLIC?
Interactive jobs
Batch jobs
Using LAM-MPI on CLIC
Using MPICH on CLIC
Using PVM on CLIC
A few little scripts
Local compiling and linking?
Using our private libraries
An attempt to compare ...
Problems with DISPLAY
Last changes

What do I need to run a program on CLIC?

You need a valid login for the domain hrz.tu-chemnitz.de and an account (=project name) for CLIC
(refer to URZ-Benutzerservice)
Define a "job" (interactive or batch job, see below).
Login to any host (HP, Sun or Linux) in the domain hrz.tu-chemnitz.de .
You may use xpbs to define a lot of options for your job ...
WARNING: xpbs can be considered as a good idea but it would need improvements to become a real "user interface"
Please have a look at those tangling xpbs-windows and find out what the options mean :-)
Submit your job to the queue system (PBS = Portable Batch System):
qsub -I my_interactive_job
or
qsub my_batch_job
Note: The former command pbs_qsub (from /uni/global/bin) is dying out and was replaced by qsub (from /usr/local/bin).
It is strongly recommended to have the following features (possibly by changing your login scripts)

You must be able to run ssh between any two machines in *.hrz.tu-chemnitz.de . Check if the command
ssh -x <host> "hostname"
works. (where "<host>" is any computer in the hrz-domain).
It should response the hostname of the remote machine and nothing else (!), i.e. no password request, no error messages and no private echo commands. At least for the use of LAM-MPI, this is very important.
If you decided to use one of the MPI versions that are available for CLIC (see below), you should add the corresponding bin-path to the environment variable PATH (check if which mpiccor which mpif77 points to the directory you wanted)
e.g. I added three lines to PUBLIC/.mycshrc
```
    if ( -d /usr/local/packages/lam-rpi-tcp-6.3.2.ssh/bin ) then
        set path=( /usr/local/packages/lam-rpi-tcp-6.3.2.ssh/bin $path )
    endif
```
If you intend to use PVM, you should (I recommend to) set the environment variable
setenv PVM_ROOT $HOME/pvm3
in your login script (see notes on PVM usage for the structure of this directory).
The PVM daemon can be started by
$PVM_ROOT/lib/pvm [nodefile]

Interactive jobs

"Submitting" an interactive job by qsub -I ... will give you a shell in your current terminal. If you use xpbs to submit an interactive job, xpbs opens a window (xterm) to execute that shell. The file $PBS_NODEFILE contains the list of hostnames (nodes) assigned to your job. The interactive shell runs on the first of them.

A simple example:

You want to use 4 nodes of CLIC for 1 hour.
First variant (do all in one command line), the shortest way to start an interactive session (if PBS graciously assigns some nodes):
qsub -I -l nodes=4,walltime=1:00:00 -A MyProjectName
I hope the parameters are self-declaring. Note, that walltime specifies a real upper limit - the job is cancelled hardly after that time.
Second variant (specify parameters in a job definition file):
create or edit a file, e.g. my_i_job, containing
```
 #PBS -l nodes=4,walltime=1:00:00
 #PBS -A MyProjectName
```
and submit this file (but don't forget the flag -I): qsub -I my_i_job

Batch jobs

If you are able to redirect all input and output of your program to files, you should use the real batch mode to run it.
The job definition file has to contain some PBS specific options written as comment for the shell (#PBS -option)
and all the shell commands to be executed on the first node of your subcluster.

A simple example:

You want to run a parallel program (ptest) on 16 nodes.
We assume: The executable program is located in $HOME/workdir. All input comes from files in the same directory. The program uses LAM-MPI.
It is recommended to redirect stdout and stderr to files to analyze messages. The option "-k oe" selects a default way to redirect stdout to $HOME/<jobname>.o<seq> and stderr to $HOME/<jobname>.e<seq>.
By default you will be sent an e-mail if the job aborts. You may specify a different option
-m abe (e-mails for abort, begin, end of job execution, any combination) or
-m n (no e-mails)

Create or edit a job definition file, e.g. my_b_job, containing

 #PBS -l nodes=16,walltime=1:30:00
 #PBS -A MyProjectName
 #PBS -k oe
 #PBS -m ae
 #PBS -q @clic0a1.hrz.tu-chemnitz.de
 # start my program now
 cd workdir
 setenv LAMBHOST ${PBS_NODEFILE}.lam.eth1
 lamboot -v
 set ERR=$?
 if ( $ERR == 0 ) then
    mpirun -np 16 ptest 
    wipe -v
 else
    echo "lamboot - errcode = $ERR"
    exit 1
 endif

Be sure to have the appropriate version of MPI from /usr/local/packages/... in your search path (.cshrc or .profile), see Remarks.
Submit the job: qsub my_b_job
(and wait for the e-mail notifying about the end of this job).

Access to a special queue

In an "emergency" case (defect switch) we had access to CLIC by a special queue only. In such a case, your batch job should contain, e.g.,

 #PBS -q clicDefectQ@clic0a1.hrz.tu-chemnitz.de

and use only hostfiles with appendix eth0 instead of eth1.

Using LAM-MPI on CLIC

For general information concerning the usage of LAM-MPI refer to this (german) document.

On CLIC, there are installed 3 (marginal different) versions of LAM-MPI on CLIC under /usr/local/packages. I decided to use the TCP version since the others seem to have advantages for dual processor boards only.

Compiling and Linking with LAM-MPI:
use some variables in your Makefile, e.g.
MPIHOME=/usr/local/packages/lam-rpi-tcp-6.3.2.ssh
or
MPIHOME=/usr/local/packages/lam-rpi-tcp-6.5.6
CC=$(MPIHOME)/bin/mpicc
F77=$(MPIHOME)/bin/mpif77
That's all. If you use $(F77) or $(CC) for linking you don't need any explicit specification of MPI libraries.
Most of the following details you may forget - if you use a shell script for initialization (clic_init_lam)
You must initialize the (sub)cluster before running your program:
setenv LAMBHOST $PBS_NODEFILE
lamboot -v
If you want to run your program (linked with LAM-MPI) as single node version, you must even so initialize the LAM daemon on this single node, e.g.:
unsetenv LAMBHOST
lamboot
Now you may start the program:
mpirun [-v] -np <number_of_nodes> <executable>
Be sure that your <executable> is readable for anyuser or the AFS group urz:clicnodes (no AFS token for the other nodes). Alternatively, you may start the program with the additional flag -s n0 in order to load (distribute) the program from this node.
You can run the program multiple times without re-lamboot-ing (on the same subcluster). At the end you should kill the daemons:
wipe [-v] or (better, but not for LAM 6.3.2) lamhalt
In the case of abnormal termination you should verify that no more processes are hanging:
lamclean -v
IMPORTANT NOTE (if your program will access any environment variable): The environment must be completely defined before lamboot (or clic_init_lam). Variables defined after that, must be exported explicitly by the mpirun command:
mpirun -x VARIABLE1,...,VARIABLEn -np ... ...

Notes on LAM MPI 6.5.1: Using very long packages and highly parallel simultaneous exchange, LAM 6.3.2 managed it to transfer 140 Mbit/s per node, but LAM 6.5.1 did not exceed 128 Mbit/s per node.
Most recent installed version is LAM MPI 6.5.6
MPIHOME=/usr/local/packages/lam-rpi-tcp-6.5.6

Remarks:

As mentioned above, you should have $MPIHOME/bin in your search path. Check if lamboot, mpirun, wipe, ...are found there (which ...).

PBS supplies a machine file $PBS_NODEFILE containing the hostnames of your subcluster. However, the hostnames belong to the so-called service network (eth0). The communication network (eth1) with high performance switches (important for more than 48 nodes) can be accessed by slightly different hostnames.
Via $PBS_NODEFILE you may access 4 different machine files for the following situation:

Filename	Communication via	to be used for
`${PBS_NODEFILE}.lam.eth0`	service network (eth0)	LAM MPI or PVM
`${PBS_NODEFILE}.lam.eth1`	communication network (eth1)	LAM MPI or PVM
`${PBS_NODEFILE}.mpich.eth0`	service network (eth0)	MPICH
`${PBS_NODEFILE}.mpich.eth1`	communication network (eth1)	MPICH

Only the master node (where you have a shell to start mpirun) has full access to AFS (a "token"), because LAM-MPI does not export the AFS-token to the other nodes. This node must be the first entry in $LAMBHOST. (otherwise problems with stdin and stdout may appear)

Error messages from recon or lamboot: If lamboot fails on one or more nodes, check if there are temporary files from a previous run on those nodes, and delete them all, e.g.
```
   clic_chk -b 'rm -r /tmp/lam-<myuserid>@*'
```
Another mysterious message may be that your executable is not found on one of the nodes. -- Check if there is still running an instance of this program and kill it:
```
   clic_chk -b 'killall <myexecutable>'
```

Using MPICH on CLIC

MPICH is installed locally on CLIC under /usr/local/packages/mpich-1.2.4.ssh/. Just as described above for LAM MPI you can do the following:

Compiling and Linking with MPICH:
use some variables in your Makefile, e.g.
MPIHOME=/usr/local/packages/mpich-ch_p4-1.2.4.ssh
CC=$(MPIHOME)/bin/mpicc
F77=$(MPIHOME)/bin/mpif77
That's all. If you use $(F77) or $(CC) for linking you don't need any explicit specification of MPI libraries.
In contrast to LAM-MPI, the name of the current host (running your shell) should be the last line of the machinefile for MPICH.
So use $PBS_NODEFILE.mpich.eth1 or $PBS_NODEFILE.mpich.eth0.

There are different ways to run a program with MPICH:

(1) Start your program with
mpirun [-nolocal] -np <number_of_nodes> -machinefile $PBS_NODEFILE.mpich.eth0 <executable>
If you want to run your program (linked with MPICH) as single node version, you may start it without mpirun.
NOTE: In this way each process is started by mpirun via ssh, and the ssh processes (as many as your number_of_nodes) stay on the first node until the program is finished, and each ssh process uses 1 MB of memory (!)
In my first tests there was some trouble with MPICH using the communication network (eth1) - but it seems to be OK now (or it depends on the number of nodes in use?), so try to use -machinefile $PBS_NODEFILE.mpich.eth1
I didn't investigate all the trouble yet, but I remember the following:
Using -nolocal may cause problems with stdin for interactive usage,
Without -nolocal you should have a machinefile (like $PBS_NODEFILE.mpich.eth1) with the local node placed at the end of the file (otherwise you get 2 processes on node 0)
(2) Using a daemon which has to be started before mpirun.
You may forget the following details, if you use the shell script clic_init_mpich to initialize the MPICH daemons.
However, you must be sure to have your home directory readable for system:anyuser or set 2 symbolic links:
cd $HOME
ln -s PUBLIC/.p4apps .
ln -s PUBLIC/.server_apps .
Here are the details for those who want to know what happens:
Start a daemon before mpirun, e.g.
set PORTNO `id -u`
set LOGFILE=/tmp/p4log.$PBS_JOBID
set MFILE=${PBS_NODEFILE}.mpich.eth1
foreach H ( `cat $MFILE` )
ssh -x $H serv_p4 -o -p $PORTNO -l $LOGFILE
end
and then run your parallel program
mpirun -np <number_of_nodes> -p4ssport $PORTNO -machinefile $MFILE <executable>
or, using environment variables,
setenv MPI_USEP4SSPORT yes
setenv MPI_P4SSPORT $PORTNO
mpirun -np <number_of_nodes> -machinefile $MFILE <executable>
For the usage of -nolocal see above (1).
(3) Using the ch_p4mpd device with some advantages in speed of start-up and management of stdio. The installation path differs from above:
```
   MPIHOME=/usr/local/packages/mpich-ch_p4mpd-1.2.4.ssh
```
The daemons for this environment can be initiated calling the script
```
   clic_init_mpichmpd
```
However, there are some important hints for mpirun, because the daemons do not know anything about environment variables. Thus, you have to specify the full path of the parallel program to be started, e.g. by
```
   mpirun -np <number_of_nodes> -machinefile $PBS_NODEFILE.mpich.eth1 `pwd`/myexecutable
```
If the program needs any (non-standard) shared libraries you have to specify the corresponding environment variable explicitly:
```
   mpirun -np <number_of_nodes> -machinefile $PBS_NODEFILE.mpich.eth1 `pwd`/myexecutable \ 
          -MPDENV- LD_LIBRARY_PATH=$LD_LIBRARY_PATH
```
In the case of abnormal termination you should verify that no more processes are hanging. Unfortunately MPICH often leaves the processors "unclean". In this case the hanging processes waiting for communication use 100% CPU time! They must be killed manually, e.g.
foreach H ( `cat $PBS_NODEFILE` )
ssh -x $H "killall <executable> serv_p4"
end
or using one of my scripts
clic_chk "killall <executable> serv_p4 "
Of course, for killing processes it doesn't matter which of the two networks (=host names) is used.

Using PVM on CLIC

PVM can be used from /afs/tucz/project/sfb393/pvm3 for the PVM architecture LINUX. The PVM daemon is started by
$PVM_ROOT/lib/pvm [-n<master_hostname>] <hostfile>
where <hostfile> may be $PBS_NODEFILE or ${PBS_NODEFILE}.lam.eth1 as described above in Remarks
The flag -n<master_hostname> is important for the correct use of the communication network (eth1). You can get <master_hostname> as `head -1 ${PBS_NODEFILE}.lam.eth1`.

Typical problems using PVM:

If PVM seems to use only one node - check environment variables, directory structure and access rights (AFS) to your directories:
Your .cshrc or .login (or .profile) has to set the environment variable PVM_ROOT to the value $HOME/pvm3.
The directory structure should have been created initially in the following way
```
  cd $HOME
  mkdir pvm3 pvm3/bin pvm3/bin/LINUX
  cd pvm3
  ln -s /afs/tucz/project/sfb393/pvm3/lib .
  ln -s /afs/tucz/project/sfb393/pvm3/include .
```

The AFS access rights must be set to

  chacl -R -dir pvm3 -acl urz:clicnodes rl

The executable must be found in ~/pvm3/bin/LINUX/.
This may be a symbolic link to the original file in any other working directory:
```
  ln -s ~/workdir/my_executable ~/pvm3/bin/LINUX/my_executable
```
However, don't forget to set the correct access rights (urz:clicnodes rl) for this workdir, too.
Instead of urz:clicnodes you may also allow read access for system:anyuser.

If in doubt, use the PVM console to check the current situation. In interactive mode you get an extra console window by clic_init_pvm -x. In this window use the following commands:

`conf`	to see if the PVM daemon runs on all nodes.
`ps -a`	to see how many processes are running on which node.
`version`	to display the current PVM version (3.4.4) of the daemon which must correspond to the library version your program was linked with.
`reset`	to kill hanging processes.

A few little scripts

For simplification there are a few shell scripts to initialize a subcluster either for LAM-MPI or for MPICH (and for PVM, too):
clic_init_lam [ < input_file ]
clic_init_mpich [ < input_file ]
clic_init_mpichmpd [ < input_file ]
clic_init_pvm [-x] [ < input_file ]
By default they will select the corresponding machinefile (using the communication network), start the corresponding daemon and then (in case of success) run a shell interactively. Before entering the subshell the scripts will add the bin directory of the appropiate MPI version to the top of your search path (if you didn't it yet).
The user may specify the argument "eth0" if he explicitly demands the service network instead of the communication network.
If you leave the shell (exit) the daemons are killed and temporary files are deleted.
For simplicity, each of the scripts defines an environment variable CLIC_NUMNODES with the number of nodes defined in $PBS_NODEFILE. This variable is available for the subshell.
For usage in batch mode you may redirect the input from a file which contains the mpirun command and data for your program.
Of course, you may also write it in your batch job, e.g.
clic_init_lam <<EOF
mpirun -np 16 <myexecutable>
EOF
Another helpful script may be the following which executes a specified command via ssh on each of the nodes listed in $PBS_NODEFILE
clic_chk [-b] [command]
the flag "-b" means to execute the ssh commands in the background instead of one by one. If no command is specified, clic_chk will only echo an "OK" from each node (to check if ssh works).
As a special case of clic_chk you may run the script
clic_chk_load
which extracts those nodes from $PBS_NODEFILE with a load average of more than 0.10. This will take a while for large number of nodes, the program pload (see below) may be better for a quick test.
If you want to check the connection via another machine file than $PBS_NODEFILE, please use
chkhosts [-b] machine_file [command]
instead.
The script clic_init_pvm is similar to those for MPI. The flag -x is for interactive use only, since it will open an additional xterm running the PVM console. Hence, you need a working DISPLAY - xhost [+] connection.

The current state of CLIC may be displayed by

   clic_show

The output of this script looks like this (you may also see the current state here) :

Server           Max Tot Que Run Hld Wat Trn Ext Status
---------------- --- --- --- --- --- --- --- --- ----------
clic0a1.hrz.tu-c   0  41  24  17   0   0   0   0 Scheduling

clic0a1.hrz.tu-chemnitz.de: 
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
1075.clic0a1.hr fci      clicNode thin.job    14190  16  --    --  2000: R 189:1
1269.clic0a1.hr frank    clicNode pbs_clc.sh  11888   1  --    --  150:0 R 96:23
1270.clic0a1.hr frank    clicNode pbs_clc.sh  10806   1  --    --  150:0 R 96:15
1272.clic0a1.hr klpa     clicNode STDIN       13630 111  --    --  250:0 R 94:20
1309.clic0a1.hr tnc      clicNode inter.sh    27726   1  --   512b 250:0 R 87:35
1310.clic0a1.hr tnc      clicNode inter.sh    21408   1  --   512b 250:0 R 87:20
1333.clic0a1.hr frank    clicNode pbs_clc.sh   1314   1  --    --  150:0 R 22:40
1340.clic0a1.hr frank    clicNode pbs_clc.sh  10339   1  --    --  150:0 R 21:19
1346.clic0a1.hr ikondov  clicNode set1a_2-d1  15939  48  --    --  25:00 R 17:00
1350.clic0a1.hr mibe     clicNode bdmpitest     --  238  --    --  08:00 Q   -- 
1351.clic0a1.hr klpa     clicNode STDIN       12615  44  --    --  250:0 R 14:39
1352.clic0a1.hr ikondov  clicNode set1a_34-d  32096  48  --    --  25:00 R 10:23
1353.clic0a1.hr ikondov  clicNode set1a_34-d  20435  48  --    --  25:00 R 10:23
1355.clic0a1.hr ikondov  clicNode set1a_34-d  19408  48  --    --  25:00 R 10:24
1356.clic0a1.hr ikondov  clicNode set1a_34-d  10725  48  --    --  25:00 R 10:22
1357.clic0a1.hr ikondov  clicNode set1a_34-d   9506  48  --    --  25:00 R 10:23
1358.clic0a1.hr ikondov  clicNode set1a_34-d  19348  48  --    --  25:00 R 08:03
1359.clic0a1.hr ikondov  clicNode set1a_34-d    --   48  --    --  25:00 Q   -- 
1360.clic0a1.hr ikondov  clicNode set1a_34-d    --   48  --    --  25:00 Q   -- 
1361.clic0a1.hr ikondov  clicNode set1a_34-d    --   48  --    --  25:00 Q   -- 
1362.clic0a1.hr ikondov  clicNode set1a_34-d    --   48  --    --  25:00 Q   -- 
1363.clic0a1.hr ikondov  clicNode set1a_34-d    --   48  --    --  25:00 Q   -- 
 ..........
1379.clic0a1.hr ikondov  clicNode set1a_34-d    --   48  --    --  25:00 Q   -- 
1380.clic0a1.hr ikondov  clicNode set1a_34-d    --   48  --    --  25:00 Q   -- 
1381.clic0a1.hr ikondov  clicNode set1a_34-d    --   48  --    --  25:00 Q   -- 
1382.clic0a1.hr pester   clicNode STDIN         --    4  --    --  04:00 R 01:07 
    522 nodes  in use,
      0 nodes  free,
      7 nodes  offline.

Not a script but a small program may be used to find out if another user left some of your nodes "unclean":
mpirun -np ... pload.CLIC.lamXXX (for LAM-MPI, XXX=632 or 656 for the current LAM version)
mpirun -np ... pload.CLIC.mpich (for MPICH).
This program will run a few seconds and then show a time diagram with one row per node. Nodes which have much more CPU time than others should be inspected in order to find hanging processes (please send a message to clicadmin if you found such processes of other users, or system processes such like klogd).

Where can you find the scripts?
/afs/tucz/project/sfb393/bin/ or /usr/local/bin/ (some of them modified by Mike Becher; more options and help)

Local compiling and linking?

It is very annoying, if you have to wait for a CLIC node assigned by PBS - if you only want to get your program compiled and linked.
In my tests I found no problems to use locally installed versions of LAM-MPI, MPICH in order to compile and link the programs on my desktop. Then the executable runs on CLIC. There is also no problem to have different Linux distributions (local: S.u.S.E., CLIC: RedHat).
The local installations (not really "local") can be used by anyone else:

LAM-MPI 6.3.2 /afs/tucz/project/sfb393/packages/lammpi.CLIC

LAM-MPI 6.5.9 /afs/tucz/project/sfb393/lammpi

MPICH 1.1.1 /afs/tucz/project/sfb393/mpich

PVM 3.4 /afs/tucz/project/sfb393/pvm3

NOTE for LAM-MPI:
By default mpif77 calls "f77". In our local installation, however, f77 is not usable, so I modified the script mpif77 to use g77 as default.
You may check the command line by

   mpif77 -showme

Using our private libraries?

The libraries we have been developing and using for several years are also usable for CLIC. The library path is
/afs/tucz/project/sfb393/FEM/libs/$archi
where $archi is an environment variable defining the architecture and/or the parallel system to use.
Here is an overview for Linux:

archi= where to use for "make" Message passing library Hypercube communication library

LINUX any Linux computer (*.mathematik) MPICH libMPIcom.a

PVM libCubecom.a

LINUX_lam any Linux computer (*.mathematik) LAM-MPI libMPIcubecom.a or
libMPIcom.a

CLIC CLIC nodes
(clicxxxx.hrz) LAM-MPI libMPIcubecom.a or
libMPIcom.a

MPICH libMPICHcom.a

PVM libCubecom.a

LinuxPGI Linux with access to /afs LAM-MPI libMPIcubecom.a or
libMPIcom.a

Intel Linux with access to /afs
Compiler needs a fistful of environment variables LAM-MPI libMPIcubecom.a or
libMPIcom.a

MPICH libMPICHcom.a or
libMPICHcubecom.a

What else do you need? In each case have a look at the file
/afs/tucz/project/sfb393/FEM/libs/$archi/default.mk
to verify default paths and variables (possibly to overide in your Makefile)

An attempt to compare ...

Each message passing system has some particular features. I will try to split them into advantages and disadvantages:

advantages disadvantages

LAM-MPI

communication network (eth1) can be used

after lamboot the mpirun command is very quick

very good behavior of MPI_sendrecv (upto 140 Mbit/s, no lack of performance caused by the switches for more than 100 nodes)

problems with more than 228 nodes (fixed)
(workaround: mpirun -lamd ...)

lamboot and wipe need "some time"

and may hang up if one node has trouble
(because they use ssh sequentially)

bad implementation of global communication MPI_Allreduce, ....

the directory where the executable is started from must be readable for urz:clicnodes or system:anyuser (no ssh connection - no AFS token)

MPICH

good implementation of global communication
(MPI_All...),
[but does not use the duplex mode of ethernet cards (so only upto 100 MBit/s ?)]

with serv_p4 the first two disadvantages of the right column disappear

mpirun takes a long time (as lamboot does for LAM-MPI)

mpirun creates a lot of ssh-processes on node_0

programs use 100% cpu time if they are waiting in send/recv

memory problems for very long messages (P4_GLOBMEMSIZE must be increased)

PVM

can use 512 nodes or more (if they are available :-)

PVM daemon starts faster than that of LAM-MPI

communication performance worse than MPI
(total bandwidth is 30...60 % of MPI)

libCubecom.a

needs only send and recv from any message passing library (to be used for PVM)

number of nodes must be a power of 2

libMPIcom.a or
libMPICHcom.a

uses more features of MPI

"Cube"-communication works with any number of nodes

bad global communication performance for LAM-MPI

libMPIcubecom.a

private implementation of global communications (Cube_DoD, Cube_DoI, Cube_Cat) using only MPI_sendrecv
(best performance with LAM-MPI)

number of nodes must be a power of 2

DISPLAY Problems

[since ∼ March 2006]
Due to software upgrade w.r.t. OpenSSH and X-server some problems have occured to receive graphical output from a parallel running program on CLIC.

Reason: ssh tunneling for X11 data does not work backward from CLIC and a new default configuration of the Xserver on local machines rejects any connection other than such secure tunnels.

Workaround: You must "forward" the DISPLAY variable that was obtained by ssh on the compute server where you logged in from outside.
Assume "remhost" to be the hostname of this compute server, then the value of $DISPLAY will be something like remhost:xx.0. You can forward this variable to CLIC as an argument of qsub:
qsub -I ... -v DISPLAY=$DISPLAY

Last Changes

06-12-06 : Problems using DISPLAY from CLIC to local host
19-12-05 : pbs_qsub replaced by qsub
17-05-05 : updated: version number for MPICH; added: information about ch_p4mpd device of MPICH
05-07-04 : accessing "clicDefectQ";
20-10-03 : some url's to URZ sites have changed.
03-04-03 : added: more hints for PVM users.
31-03-03 : added: note on environment variables for user programs.
08-05-02 : updated: hints on AFS access for LAM-MPI
26-04-01 : clic_init_mpich had to be changed (did not work in batch mode).
01-06-01 : Some hints about LAM error messages, detected by U. Elsner.
13-06-01 : added information about the script clic_show.
05-11-01 : added script clic_chk_load
10-12-01 : pload - a small program to find suspicious nodes in your subcluster (nodes with hanging processes).

Fakultät für Mathematik, TU Chemnitz
webstat , Matthias Pester, 12.12.2000

LAM-MPI 6.3.2	`/afs/tucz/project/sfb393/packages/lammpi.CLIC`
LAM-MPI 6.5.9	`/afs/tucz/project/sfb393/lammpi`
MPICH 1.1.1	`/afs/tucz/project/sfb393/mpich`
PVM 3.4	`/afs/tucz/project/sfb393/pvm3`

`archi=`	where to use for "make"	Message passing library	Hypercube communication library
LINUX	any Linux computer (`*.mathematik`)	MPICH	`libMPIcom.a`
LINUX	any Linux computer (`*.mathematik`)	PVM	`libCubecom.a`
LINUX_lam	any Linux computer (`*.mathematik`)	LAM-MPI	`libMPIcubecom.a` or `libMPIcom.a`
CLIC	CLIC nodes (`clicxxxx.hrz`)	LAM-MPI	`libMPIcubecom.a` or `libMPIcom.a`
		MPICH	`libMPICHcom.a`
		PVM	`libCubecom.a`
LinuxPGI	Linux with access to `/afs`	LAM-MPI	`libMPIcubecom.a` or `libMPIcom.a`
Intel	Linux with access to `/afs` Compiler needs a fistful of environment variables	LAM-MPI	`libMPIcubecom.a` or `libMPIcom.a`
Intel		MPICH	`libMPICHcom.a` or `libMPICHcubecom.a`
What else do you need?	In each case have a look at the file `/afs/tucz/project/sfb393/FEM/libs/$archi/default.mk` to verify default paths and variables (possibly to overide in your Makefile)

	advantages	disadvantages
LAM-MPI	communication network (eth1) can be used after lamboot the mpirun command is very quick very good behavior of `MPI_sendrecv` (upto 140 Mbit/s, no lack of performance caused by the switches for more than 100 nodes)	problems with more than 228 nodes (fixed) (workaround: `mpirun -lamd ...`) `lamboot` and `wipe` need "some time" and may hang up if one node has trouble (because they use ssh sequentially) bad implementation of global communication `MPI_Allreduce, ....` the directory where the executable is started from must be readable for `urz:clicnodes` or `system:anyuser` (no ssh connection - no AFS token)
MPICH	good implementation of global communication (`MPI_All...`), [but does not use the duplex mode of ethernet cards (so only upto 100 MBit/s ?)] with `serv_p4` the first two disadvantages of the right column disappear	`mpirun` takes a long time (as `lamboot` does for LAM-MPI) mpirun creates a lot of ssh-processes on node_0 programs use 100% cpu time if they are waiting in `send/recv` memory problems for very long messages (P4_GLOBMEMSIZE must be increased)
PVM	can use 512 nodes or more (if they are available :-) PVM daemon starts faster than that of LAM-MPI	communication performance worse than MPI (total bandwidth is 30...60 % of MPI)
`libCubecom.a`	needs only `send` and `recv` from any message passing library (to be used for PVM)	number of nodes must be a power of 2
`libMPIcom.a` or `libMPICHcom.a`	uses more features of MPI "Cube"-communication works with any number of nodes	bad global communication performance for LAM-MPI
`libMPIcubecom.a`	private implementation of global communications `(Cube_DoD, Cube_DoI, Cube_Cat)` using only `MPI_sendrecv` (best performance with LAM-MPI)	number of nodes must be a power of 2