CASUP FAQ

With any questions, bug reports, feedback and any other comments please submit a ticket.

Questions

Prerequisites or what 3rd party software is required to build and use CASUP?
How do I checkout CASUP subversion repository?
I'm behind a firewall, how can I checkout CASUP subversion repository?
How do I build CASUP library?
Error: Sorry, coindexed access to a pointer or allocatable component of the coindexed coarray at (1) is not yet supported.
Can I combine CASUP with MPI?
Errors: undefined reference to `pat_region_begin_', undefined reference to `pat_region_end_'
How do I run CASUP tests?
How do I run CASUP tests in PBS queue?
How much memory will I need (aka out-of-memory, OOM)?

Answers

Prerequisites or what 3rd party software is required to build and use CASUP?
A compiler that supports Fortran 2008 coarray features is required. If the compiler also supports Fortran 2015 coarray features, particularly collectives, then a richer set of capabilities will be available from CASUP.
1. An MPI library installation is required for MPI/IO, module cgca_m2mpiio.f90. (The source in svn repo: cgca_m2mpiio.f90.) Typically an MPI library will be installed as part of the compiler or runtime library installation, e.g. OpenCoarrays installation will automatically check for and install some MPI distribution, possibly MPICH or OpenMPI. Intel compiler is probably best used with the Intel MPI libraries, although it's possible to use some other MPI distribution.
2. HDF5 library with support for parallel and Fortran features is required for module cgca_m2hdf5.f90. (The source in svn repo: cgca_m2hdf5.f90.)
3. NetCDF, and NetCDF-Fortran libraries are required for module cgca_m2netcdf.f90. (The source in svn repo: cgca_m2netcdf.f90.)
Further software is required for some platforms. For example, Makefile-bc3-mpiifort-tau (the source in svn repo: Makefile-bc3-mpiifort-tau) is for building CASUP with TAU instrumentation, for profiling and tracing analysis.
How do I checkout CASUP subversion repository?
On unix/linux use svn checkout command, which can be abbreviated to svn co. If are want just to use CASUP, you can use either svn or https protocols. However, if you have commit access to CASUP, you must use the https protocol! The exact commands for each protocol are helpfully suggested by SourceForge. You can pull the whole tree in some directory zzz as e.g.
```
mkdir zzz
svn co https://svn.code.sf.net/p/cgpack/code/ zzz
```
or
```
mkdir zzz
cd zzz
svn co https://svn.code.sf.net/p/cgpack/code/ .
```
or you can pull just a particular directory, e.g. head:
```
svn co https://svn.code.sf.net/p/cgpack/code/head .
```
or a release:
```
svn co svn://svn.code.sf.net/p/cgpack/code/realeses/2.9/ .
```
I'm behind a firewall, how can I checkout CASUP subversion repository?
Use an ssh tunnel. A lot of help on this is available online, so let's just look at some simple examples.

A typical example is when you want to use CASUP on a remote system, e.g. an HPC system, which has some restrictions on network traffic. If you want to use svn protocol, then from your terminal you have to establish an ssh tunnel to the remote host, e.g.
```
ssh -R12345:svn.code.sf.net:3690 <username>@<system>
```
where 3690 is the standard subversion port, and 12345 is an arbitrarily chosen free port. This command should log you into the remote host and establish the tunnel. Then from the remote host do e.g.
```
svn co svn://localhost:12345/p/cgpack/code/ .
```
where the port number must match that given in the ssh command.

If you need to use https protocol, then need to use the standard https port 443:
```
ssh -R12345:svn.code.sf.net:443 <username>@<system>
```
Then from the remote host do e.g.
```
svn co https://localhost:12345/p/cgpack/code/ .
```
How do I build CASUP library?
We are working on autotools based build system, but this hasn't been completed yet. In the meantime, there is a collection of Makefiles for different platforms:
```
Makefile-archer
Makefile-bc
Makefile-bc3-ifort-shared
Makefile-bc3-mpiifort
Makefile-bc3-mpiifort-tau
Makefile-bc3-oca
Makefile-FreeBSD
Makefile-mpiifort-scorep
```
If there is a Makefile for your exact platform, then just use it, e.g. on FreeBSD with OpenCoarrays installed:
```
make -f Makefile-FreeBSD
```
The build will produce a number of object files, *.o, the unix library archive libcgpack.a, and, on some platforms, also module and submodule files, *.mod and *.smod.

Refer to the online documentation on the provided Makefiles for more details.
Error: Sorry, coindexed access to a pointer or allocatable component of the coindexed coarray at (1) is not yet supported.
This error is from the GCC Fortran compiler, gfortran, which prior to version 7 did not support coarrays of derived type with allocatable or pointer components. The obvious solution is upgrade to GCC7. If this is not possible, a smaller subset of CASUP library routines can be built by excluding the modules and the submodules which throw this error. At present these are module cgca_m3pfem, source code: cgca_m3pfem.f90, and submodule m3pfem_sm1, source code: m3pfem_sm1.f90. You'd also need to comment out cgca_m3pfem module from the top level module cgca:
```
!use cgca_m3pfem
```
Can I combine CASUP with MPI?
Yes. Look at these tests:
- ABM, source code: testABM.f90
- ABN, source code: testABN.f90
- ABO, source code: testABO.f90
- ABV, source code: testABV.f90
- ABW, source code: testABW.f90
- ACD, source code: testACD.f90
- ACF, source code: testACF.f90
You'll see calls to CASUP routines which in turn include calls to MPI/IO, or NetCDF or HDF5 routines, which call MPI/IO. For example cgca_pswci ( source: m2out_sm2_mpi.f90) uses raw MPI/IO. This CASUP routine is called in tests ABM, ABN, ABO, ABV, ABW. ACD. CASUP routine ccgca_pswci2 uses MPI/IO. Routine cgca_pswci3 uses NetCDF. Routine cgca_pswci4 uses HDF5. All 3 routines are used in test ACF. The 3 compilers, Cray, Intel and OpenCoarrays seem, to work with coarray/MPI code, although Intel does not support this.

Errors: undefined reference to `pat_region_begin_', undefined reference to `pat_region_end_'

Some tests, e.g. testABK, testABK.f90, (the source in svn repo: testABK.f90) use CrayPAT (the Cray Performance Analysis Tool) API calls, such as PAT_region_begin or PAT_region_end. Cleary such tests can be built only on Cray platforms. To link such tests into executables, Cray module perftools must first be loaded. Otherwise you see errors such as these on linking:

ftn testABK.o -o testABK.x testaux.o  -L<path to your CASUP library> -lcgpack
cgpack/head/tests/testABK.f90:8: undefined reference to `pat_region_begin_'
cgpack/head/tests/testABK.f90:144: undefined reference to `pat_region_end_'
cgpack/head/tests/testABK.f90:8: undefined reference to `pat_region_begin_'
cgpack/head/tests/testABK.f90:158: undefined reference to `pat_region_end_'

After perftools module has been loaded, the linking should proceed fine:

uzi@eslogin001:~/cgpack/head/tests> module load perftools
uzi@eslogin001:~/cgpack/head/tests> make -f Makefile-archer 
ftn testABK.o -o testABK.x testaux.o  -L/home/ecse0505/ecse0505/uzi/lib -lcgpack
uzi@eslogin001:~/cgpack/head/tests>

How do I run CASUP tests?
Tests which call getcodim subroutine from module testaux.f90 take 2 integer arguments from command line. At present these are tests from AAA to ABB, from ABD to ABL, from ABN to ABS. For example test AAF can be run with the GCC/OpenCoarrays cafrun launcher with command line arguments 2 and 2:
```
cafrun -np 16 ./testAAF.x 2 2
```
These arguments are codimensions 1 and 2. Codimension 3 is calculated as the total number of images divided by codim1 and by codim2. If codimension 3 is not a positive integer, the program aborts. In this example the 3rd codimension is 16 / (2*2) = 4 .

With the Intel compiler, when running on a single node, i.e. with only shared memory, can launch simply as:
```
./testAAF.x 2 2
```
Other tests do not need command line arguments. If you want to use the Intel compiler with distributed memory, first you need to compile in a special way, and to create a coarray config file. The command line arguments are written to that config file, and you can launch the test simply as:
```
./testAAF.x
```
On Cray use aprun with the usual job options, such as -n, -N, -S, -d, -T etc. and by giving the codimensions 1 and 2 on the command line, e.g.:
```
#PBS -l select=1000
aprun -n 24000 -N 24 -S 12 -d 1 -T ./testAAF.x 40 30
```
This example is from ARCHER, where nodes have 24 cores. The 3rd codimension will be 24000 / (40*30) = 20.

How do I run CASUP tests in PBS queue?

Just choose an appropriate job.*.pbs file from tests/templates directory, e.g. job.mpiifort.pbs:

$ cat templates/job.mpiifort.pbs 
#!/bin/bash --login
#$Id: job.mpiifort.pbs 407 2017-05-16 15:23:59Z mexas $
#
# mpiifort on PBS
#
# submit with e.g.
#   qsub -q testq -j oe -l walltime=00:10:00,nodes=2:ppn=16 job

# Load the Intel module
module add languages/intel-compiler-16-u2
module list

# Set some helpful vars
EXE=testABW.x
CACONF=xx14.conf
CONFILE=nodes

# Switch to current working directory
cd $PBS_O_WORKDIR

# Prepare the MPI machine file and calculate the number of procs
cat $PBS_NODEFILE > $CONFILE
NNODES=`cat $PBS_NODEFILE | wc -l`

# Prepare conf file
echo -genvall -genv I_MPI_DEBUG=2 -genv I_MPI_FABRICS=shm:dapl \
 -machinefile ./$CONFILE -n $NNODES ./$EXE > $CACONF

# Run
echo "START:" `date`
./$EXE
echo "END:  " `date`

This PBS script is for mpiifort (Intel Fortran over MPI) compiler/libraries. The executable file is testABW.x. There are some Intel MPI envars, which are sourced with -genv. The comment at the top gives an example qsub command to submit this file to PBS.

Adjust to your environment!

How much memory will I need (aka out-of-memory, OOM)?
You might get runtime errors such as e.g.
```
ERROR: cgca_sld/cgca_m3sld: allocate( array ), err stat: 4205
err message: The program was unable to request more memory space.
```
This particular message is from Cray. On other systems the STAT value and the error message might be different, but the problem is that you don't have enough memory to allocate further arrays. At this point you are probably wondering - How do I to predict the memory usage of a CASUP program?

No easy answer can be given, but there are 2 possible strategies leading to the lower and the upper bound estimates. The lower bound is based on the input parameters, such as the box size, the mean grain size, the spatial resolution, the number of information layers in the space array, etc. The upper bound is based on a successful completion of a CASUP program and on collecting the job memory usage stats.

First the lower bound. The total number of CA cells can be estimated from the CA box dimensions, bsz0, the mean grain size, dm, and the spatial resolution (cells per mean grain), res. These particular variable names are taken from test ABV, source code: testABV.f90. Consider this fragment:
```
! physical dimensions of the box, assume mm
bsz0 = (/ 4.0, 3.0, 5.0 /)

! mean grain size, linear dimension, e.g. mean grain diameter, also mm
dm = 1.0e-1

! resolution
res = 1.0e5
```
The box volume, V, is the product of 3 box dimensions, i.e. V=4.0 * 3.0 * 5.0 = 60 mm^3. The volume of the mean grain, Vd, is roughly dm**3, or Vm=1.0e-3. There are roughly V/Vm=6.0e4 grains in the model, or multiplying by the resolution, res, there are roughly 6.0e9 cells in the model. Then divide by the number of images, e.g. if using a single XC40 or XC30 node with 24 cores, you probably want to have 24 images. So each image will have roughly 6.0e9 / 24 = 2.5e8 cells. Then consider that cells are stored in space array:
```
integer( kind=iarr ), allocatable :: space(:,:,:,:) [:,:,:]
```
where iarr is the integer kind for microstructure arrays, the first 3 dimensions of space are spatial dimensions of the CA space "shoe-box" and the 4th dimension of space is the number of information layers of CA, e.g. 2 (one for microstructure, and the second for fracture information). If iarr is using 4 bytes, then space on each image will use roughly 2.5e8 * 4 * 2 = 2.0e9 bytes or 2GB. That is a bare minimum. If using solidification or fracture routines, which is most likely in any real CASUP program, then you'll need to create temp arrays of the same size as space although with only a single information layer. So for our example this adds another 1GB, i.e. the absolute minimum total is 3GB.

However! coarray variables, at least on Cray systems, are allocated from symmetric heap, i.e. 2G for space, while the temp arrays are not coarrays and allocated from standard heap memory. On Cray you need to define envar XT_SYMMETRIC_HEAP_SIZE to specify how much memory on each PE should be dedicated for coarray variables. An example: Hazel Hen XC40 system has 128GB memory per 24-core node, i.e. about 5.3GB per PE. So setting export XT_SYMMETRIC_HEAP_SIZE=3g will leave 3GB for coarray variables and 2.3GB for heap memory objects. This is roughly what you want. Giving too much to symmetric heap will starve you of normal heap for temp arrays. Giving too little to symmetric heap will not give you enough memory for space. Although there are other coarray variables in CASUP, space is by far the biggest, so our estimates are based on it.

The upper bound estimate depends on the tool used for collecting job stats. E.g. aprun, might give you stats like this:
```
Application 7371900 resources:
utime ~19061s, stime ~176s, Rss ~2960148, inblocks ~30463, outblocks ~43999
```
where Rss is the Resident Set Size (RSS) memory size in B (is this correct??? Need to confirm this!) per PE used during the simulation. Check the aprun man page for more details on RSS etc. However, RSS does not include Huge Pages, so this is likely be a gross under estimate. Furthermore, I'm not sure if it's possible to know how much of RSS was in symmetric memory and how much in normal heap.

Profiling is probably by far the best strategy if more accurate memory usage data is needed.

validate this page