Encyclopaedia Index

Running Parallel PHOENICS

This article refers to the running of parallel PHOENICS on computers running Microsoft Windows OS. It requires that the user first installs MPI (usually MPICH2), please refer to chapter 6 of TR110 if you require instruction on how to prepare your workstation(s) for running parallel PHOENICS.

1. Running the parallel Sover from the VR Editor

The simplest way to launch parallel EARTH is from the VR-Editor.

When a parallel PHOENICS licence has been purchased, an additional menu item, 'Parallel Solver', will appear under the 'Run' menu option in the VR-Editor. Once selected, the parallel solver option launches a dialog box where the user can specify parameters that will affect how the parallel solver is run.

Number of Processes

The first consideration is the number of processes to run the solver across. This is selected from the pulldown combo box at the top of the dialog. The pulldown element allows the user to select up to 64 processes incrementing in steps of 2. The user is free to type any positive integer into this box, for example to select an odd number of processes.

Cluster Host List

The 'Cluster Host List' portion of the dialog enables the user to select which hosts in a cluster are used for computation. Here there are three options,

  1. 'Local Only': the default, will just use cores on the local machine (ie that on which the instance of VR-Editor is running).
  2. 'Any': will use a computer assigned distribution of processes on the nodes in the cluster. These must have been previously identified in the cluster.
  3. 'Specify in list': users may select hosts from the scroll list. By default this list should contain those hosts previously idenified in the cluster, but one can also add to the list by using the 'Add' button. Alternatively, one can supply a 'Machine List' file which contains a list of those workstations from which to select. This file is simply a text file with name of the workstations each on a separate line.

This mode of running the parallel solver will always launch the root process on the local machine and a convergence monitoring window will appear on screen (as per the sequential solver).

MPI configuration file

A MPI configuration file is only used to fine tune the distribution of processes amongst the different computers in a cluster, so will not be required if running on a local computer only.

Domain decomposition

When using the default automatic domain decomposition, parallel PHOENICS only differs from sequential when the solver is run: problem set-up and post-processing of results can be done in exactly the same way as for the sequential version. A case that has been run in sequential mode can normally be run in parallel without any changes being made. The output from a parallel PHOENICS simulation will be result and phi files, having the same format as for sequential simulations.

It is also possible to by-pass the automatic domain decomposition algorithm, and to specify how you want to decompose the calculation domain into sub-domains. This can be done by selecting 'Manual' in the Domain decomposition group box on the Run Parallel solver dialog.

When you first switch to manual decomposition the arrangement will match the automatic decomposition that would otherwise be used. Normally automatic decomposition will suffice, but there may be occasions where a different decomposition is preferable to avoid a split at a critical location.

2. Command mode operation

In a Command Prompt window, if the EARTH executable is launched directly, then the sequential solver will be used; to run the parallel solver, the program name 'earexe' is used as an argument to the MPI launch program mpiexec.

A script RUNPAR.BAT [nnodes] is provided. The optional argument [nnodes] indicates the number of processes to be launched on the current computer. The default is to launch two processes.

For example, RUNPAR 2 will execute the MPI command:

  mpiexec –localroot -np 2 \phoenics\d_earth\d_windf\earexe

If a cluster has been defined by smpd, then the command will execute on two processors in the cluster, otherwise it will launch multiple processes on the local machine.

There are also 'run' commands which can be used in conjunction with configuration files, for example 'runcl4' uses the configuration file 'config4'. Config4 lists the PCs and processors to be used (see above Configuration file section above). The script runcl4 will execute the following command:

  mpiexec –configfile \phoenics\d_utils\d_windf\config4

3. Manual domain decomposition

In most cases, it is sufficient to use the automatic domain decompostion which is provided for within the parallel PHOENICS solver. In some cases though, the user may prefer to choose how the domain is partitioned, ie apply a manual decompostion. To do this the user can either supply a file PARDAT in the working directory or the user may add PIL settings to the Q1 which will describe how the domain is to be partitioned.

Simply put, the PIL logical LG(2) will instruct the splitter to by-pass the automatic domain decomposition, and split the domain according to the settings defined in the IG array as follows.

IG(1) specifies the number of sub-domains in the X-direction;
IG(2) specifies the number of sub-domains in the Y-direction;
IG(3) specifies the number of sub-domains in the Z-direction.

For example, to split the domain into 4 sub-domains (2 in each x and y directions and 1 in z), the following statements must be set in the Q1 file:

LG(2)=T
IG(1)=2
IG(2)=2
IG(3)=1
Warning:
  1. When using manual decomposition, set either in PARDAT or Q1, it is important to match the total number of processes indicated in the file with the number of processes requested when starting the solution. If there is a mismatch then an error is reported and the run will not go ahead.
  2. Before the computational domain can be split in the Z-direction one must first set all variables to be solved Whole-Field. If any variables are set to be solved Slab-wise and one attempts to split in Z, then an error is reported and the solution terminated.

4. Cluster Operation

When running across a cluster, the run will attempt to launch an instance of the solver from the same location on each of the compute nodes. If the default locations are used, this will be C:\phoenics\d_earth\d_windf\earexe.exe. If a Private Earth is to be used, then this also should be copied to the equivalent directory for each of the compute nodes.

When running across a cluster, it is important to consider the working directory on the compute nodes. This is because, by default, mpiexec will attempt to launch the process in the equivalent directory on all the workstations. So, if on the head node you are working in c:\phoenics\myprojects\projectbeta then this directory should also appear on all the workstations in the cluster otherwise the run will fail.

As it can difficult to always remember to create the working directory on all the cluster workstations there is an alternative. One can set up an environment variable PHOE_WORK_DIR on each of the cluster to point to an existing fixed directory

e.g. PHOE_WORK_DIR=C:\phoenics\mypar_runs

Then all processes (aside from the launching process) will write their output to this location.

PLEASE NOTE: The use of PHOE_WORK_DIR is not recommended if you are likely to make multiple parallel runs simultaneously. This is because the second parallel run (and subsequent runs) will overwrite the working files of the first.

The above methods of launching the parallel solver do not allow the user to fix the number of solver instances on each workstation. If you want that level of control, then the user will need to use the MPI Configuration file (see below).

Configuration File

The MPI configuration file option gives a more flexible way of launching the parallel solver. Assuming we have PHOENICS installed on each computer in the cluster, the following config file will use the public earexe.exe to run a single process on each of the four computers.

-localroot -np 1 -host cham-cfd1 c:\phoenics\d_earth\d_windf\earexe.exe
-np 1 -host cham-cfd2 c:\phoenics\d_earth\d_windf\earexe.exe
-np 1 -host cham-cfd3 c:\phoenics\d_earth\d_windf\earexe.exe
-np 1 -host cham-cfd4 c:\phoenics\d_earth\d_windf\earexe.exe

The following example launches two processes on each of two computers where PHOENICS is installed only on the head node:

-localroot -np 2 -host cham-cfd1 c:\phoenics\d_earth\d_windf\earexe.exe
-np 2 -host cham-cfd2 \\cham-cfd1\d_earth\d_windf\earexe.exe

Users should create their own configuration and 'run' files, based on the examples provided, tailored to their own installation. These can either be located in \phoenics\d_utils\d_windf or the local working directory.

Considerations for cluster operation

All Nodes in the cluster should belong to the same Workgroup or Domain, and the user should be logged into each Node on the Cluster using the same Workgroup/Domain User account and password.

A full PHOENICS installation must be made on the head node. A PHOENICS installation is strongly recommended on the other compute nodes, but it is not essential.

  1. If PHOENICS is only installed on the head node then the phoenics folder will need to be shared, with at least Read permissions, for the other compute nodes in the cluster. The Shared name which is chosen when the folder is shared is used in the configuration file, and in the example file 'config4' above the shared name is 'phoenics'.
    While it is possible for compute nodes to refer to the licence and configuration files on the head node, in practice this has led to problems and the solver unexpectedly stalling. As a minimum therefore the following files should be copied onto each of the compute nodes:
    C:\phoenics\d_allpro\phoenics.lic
    C:\phoenics\d_allpro\coldat
    C:\phoenics\d_allpro\config
    C:\phoenics\d_allpro\prefix
    C:\phoenics\d_earth\earcon
    C:\phoenics\d_earth\props
    
    The files get_hostid.bat and lmhostid.exe (in c:\phoenics\d_allpro) will also be needed initially to identify the compute node HOSTID necessary for unlocking the software.
    The environment variable PHOENICS should be set to C:\phoenics on the compute nodes to indicate that the configuration files are stored locally. See TR110 section 6.3.3 for some help in setting environment variables.
    Instead of running the solver directly from the network shared folder (see shared folder phoenics in config4 file above) one could also copy the solver to the compute nodes as
    C:\phoenics\d_earth\d_windf\earexe.exe
    During a run of the parallel solver the processes on the compute nodes will need to read and write some working files. These are generally only of use to the program during a run or perhaps for diagnostic purposes if things go wrong, but they still need to be written somewhere. By default these will be in an equivalent directory to that used to start the run on the head node. So if the run was started from c:\phoenics\d_priv1 on the head node the user will need to make sure that there is a directory of the same name on each of the compute nodes.
    As the files created on the compute nodes may be considered scratch files (i.e. only of short term value) it is possible to set up an environment variable PHOE_WORK_DIR on each of the compute nodes to specify where these files will be stored. Thus one does not have to remember to create a working directory for each time you change the working directory on the head node.
  2. If PHOENICS is installed on each compute node (in addition to the head node) then the Workgroup/Domain User account used to log into each compute node must allow read access to all PHOENICS folders, and write access to the working folder (the default is C:\phoenics\d_priv1).

5. Troubleshooting

Non-fatal error message while running parallel

There is a non-fatal error messages that may occur in the Command Prompt window which is

Unable to open the HKEY_LOCAL_MACHINE\SOFTWARE\MPICH\SMPD\process\1264 registry key, error 5, Access is denied.

This can occur either if you are running from a non-administrator account or if you have UAC turned on. This error is not usually fatal and the run will continue. The process number (1264) is likely to be different to the one indicated above.

It is possible to avoid this message by editing the registry with regedit.exe although it is recommended that if you do need to modify the registry that you ask your Systems Administrator to do this for you. The change required is to the Permissions of the key Computer\HKEY_LOCAL_MACHINE\SOFTWARE\MPICH\SMPD. Here one needs to allow for user group 'Users' the 'Create Subkey' permission and apply to 'this key and subkeys'.

Unable to start the parall solver due to MPICH2 not installed correctly

In this case the on-screen output will be similar to..

C:\phoenics\d_priv1>call mpiexec -localonly -np 4 "c:\phoenics\d_earth\d_windf\earexe.exe" 
Unknown option: -d
Error while connecting to host, No connection could be made because the target machine actively refused it. (10061)
Connect on sock (host=cham-cfd1.CHAM.local, port=8676) failed, exhaused all end points
Unable to connect to 'cham-cfd1.CHAM.local:8676',
sock error: Error = -1

ReadFile() failed, error 109
unable to start the local smpd manager.

If you have not installed MPICH2, then follow instructions in chapter 6 of TR110. If you have already attempted to install MPICH2, perhaps the Windows User Account Control (UAC) has prevented SMPD being installed as a service correctly. In which case it will be necessary to install it manually. To do this use the Run as administrator option to start a Command Prompt window. Next navigate to the MPICH2\bin directory, the command smpd -status will check whether SMPD is active. To manually install SMPD use the command smpd -install as shown in the seesion extract below,

C:\Windows\system32>cd \program files\mpich2\bin

C:\Program Files\MPICH2\bin>smpd -status
no smpd running on cham-cfd1.CHAM.local

C:\Program Files\MPICH2\bin>smpd -install
MPICH2 Process Manager, Argonne National Lab installed.

C:\Program Files\MPICH2\bin>smpd -status
smpd running on cham-cfd1.CHAM.local

If the Command Prompt window was not opened as an Administrator then the install command will fail with the following message

C:\Program Files\MPICH2\bin>smpd -install
OpenSCManager failed:
Access is denied. (error 5)
Unable to remove the previous installation, install failed.

If you are unable to open a window with Administrator permissions then you may need to refer to you network administrator to set SMPD up for you

The parallel solver appears to stall on start up

One calls mpiexec to start a parallel run and then there appears to be no further response, the program appear to stall in the Command Prompt window. In the Task Manager Process list you find mpiexec.exe, but there are no instances of earexe.exe. If this occurs, it is likely that the you are picking up the wrong version of mpiexec.exe, so right-click on the mpiexec.exe process in the Task Manager and select the item Open File Location. It's likely this will not be MPICH2\bin. If you have installed the Intel Fortran compiler it may refer to their version of MPI. In this case one will need to modify the PATH, so that C:\Program Files\MPICH2\bin (or where you have installed MPICH2) occurs before any alternative MPI installs.

c:\phoenics\d_priv1>call mpiexec -localonly -np 4 "c:\phoenics\d_earth\d_windf\e arexe.exe" Unknown option: -d launch failed: CreateProcess(2 -mgr -read 0000021C -write 00000218) on 'cham-pre c36.CHAM2006.local' failed, error 2 - The system cannot find the file specified. ReadFile() failed, error 109 unable to start the local smpd manager.

6. Further Information

A copy of the MPICH2 user guide (mpich2-1.4.1p1-userguide.pdf) may be found in the directory \phoenics\d_allpro\d_libs\d_win64\mpi.