CASUP > FAQ
With any questions, bug reports, feedback and any other comments please submit a ticket.
A compiler that supports Fortran 2008 coarray features is required. If the compiler also supports Fortran 2015 coarray features, particularly collectives, then a richer set of capabilities will be available from CASUP.
cgca_m2mpiio.f90. (The source in svn repo:
cgca_m2mpiio.f90.) Typically an MPI library will be installed as part of the compiler or runtime library installation, e.g. OpenCoarrays installation will automatically check for and install some MPI distribution, possibly MPICH or OpenMPI. Intel compiler is probably best used with the Intel MPI libraries, although it's possible to use some other MPI distribution.
cgca_m2hdf5.f90. (The source in svn repo:
cgca_m2netcdf.f90. (The source in svn repo:
Further software is required for some platforms.
(the source in svn repo:
is for building CASUP with
instrumentation, for profiling and tracing analysis.
On unix/linux use svn checkout command, which
can be abbreviated to svn co.
If are want just to use CASUP,
you can use either
However, if you have commit access to CASUP,
you must use the
The exact commands for each protocol are helpfully suggested
You can pull the whole tree in some directory
zzz as e.g.
mkdir zzz svn co https://svn.code.sf.net/p/cgpack/code/ zzzor
mkdir zzz cd zzz svn co https://svn.code.sf.net/p/cgpack/code/ .or you can pull just a particular directory, e.g.
svn co https://svn.code.sf.net/p/cgpack/code/head .or a release:
svn co svn://svn.code.sf.net/p/cgpack/code/realeses/2.9/ .
Use an ssh tunnel. A lot of help on this is available online, so let's just look at some simple examples.
A typical example is when you want to use CASUP
on a remote system, e.g. an HPC system, which has
some restrictions on network traffic.
If you want to use
then from your terminal you have to establish an ssh
tunnel to the remote host, e.g.
ssh -R12345:svn.code.sf.net:3690 <username>@<system>
3690 is the standard subversion port,
12345 is an arbitrarily chosen free port.
This command should log you into the remote host
and establish the tunnel.
Then from the remote host do e.g.
svn co svn://localhost:12345/p/cgpack/code/ .
where the port number must match that given in the ssh command.
If you need to use
then need to use the standard https port 443:
ssh -R12345:svn.code.sf.net:443 <username>@<system>
Then from the remote host do e.g.
svn co https://localhost:12345/p/cgpack/code/ .
We are working on
autotools based build system,
but this hasn't been completed yet.
In the meantime, there is a collection of
for different platforms:
Makefile-archer Makefile-bc Makefile-bc3-ifort-shared Makefile-bc3-mpiifort Makefile-bc3-mpiifort-tau Makefile-bc3-oca Makefile-FreeBSD Makefile-mpiifort-scorep
If there is a
Makefile for your exact platform, then just use it,
e.g. on FreeBSD with
make -f Makefile-FreeBSD
The build will produce a number of object files,
the unix library archive
libcgpack.a, and, on
some platforms, also module and submodule files,
Refer to the
on the provided
Makefiles for more details.
This error is from the GCC Fortran compiler, gfortran,
which prior to version 7 did not support
coarrays of derived type with allocatable or pointer
The obvious solution is upgrade to GCC7.
If this is not possible, a smaller subset of CASUP
library routines can be built by excluding the modules
and the submodules which throw this error.
At present these are module
You'd also need to comment out
from the top level module
Yes. Look at these tests:
ABM, source code:
ABN, source code:
ABO, source code:
ABV, source code:
ABW, source code:
ACD, source code:
ACF, source code:
You'll see calls to CASUP routines which in turn include calls to
MPI/IO, or NetCDF or HDF5 routines, which call MPI/IO.
uses raw MPI/IO.
This CASUP routine is called in tests
All 3 routines are used in test
The 3 compilers,
Cray, Intel and OpenCoarrays seem, to work with coarray/MPI code, although
Intel does not support this.
Some tests, e.g. testABK,
(the source in svn repo:
(the Cray Performance Analysis Tool)
PAT_region_begin or PAT_region_end.
Cleary such tests can be built only on Cray platforms.
To link such tests into executables,
must first be loaded.
Otherwise you see errors such as these on linking:
ftn testABK.o -o testABK.x testaux.o -L<path to your CASUP library> -lcgpack cgpack/head/tests/testABK.f90:8: undefined reference to `pat_region_begin_' cgpack/head/tests/testABK.f90:144: undefined reference to `pat_region_end_' cgpack/head/tests/testABK.f90:8: undefined reference to `pat_region_begin_' cgpack/head/tests/testABK.f90:158: undefined reference to `pat_region_end_'
module has been loaded, the linking should proceed fine:
uzi@eslogin001:~/cgpack/head/tests> module load perftools uzi@eslogin001:~/cgpack/head/tests> make -f Makefile-archer ftn testABK.o -o testABK.x testaux.o -L/home/ecse0505/ecse0505/uzi/lib -lcgpack uzi@eslogin001:~/cgpack/head/tests>
Tests which call
take 2 integer arguments from command line.
At present these are tests from AAA to ABB, from ABD to ABL,
from ABN to ABS.
For example test AAF can be run
with the GCC/OpenCoarrays
with command line arguments 2 and 2:
cafrun -np 16 ./testAAF.x 2 2
These arguments are codimensions 1 and 2.
Codimension 3 is calculated as the total number
of images divided by codim1 and by codim2.
If codimension 3 is not a positive integer,
the program aborts.
In this example the 3rd codimension is
16 / (2*2) = 4 .
With the Intel compiler, when running on a single node, i.e. with only shared memory, can launch simply as:
./testAAF.x 2 2
Other tests do not need command line arguments. If you want to use the Intel compiler with distributed memory, first you need to compile in a special way, and to create a coarray config file. The command line arguments are written to that config file, and you can launch the test simply as:
On Cray use
aprun with the usual job options,
etc. and by giving the codimensions 1 and 2 on the command line, e.g.:
#PBS -l select=1000 aprun -n 24000 -N 24 -S 12 -d 1 -T ./testAAF.x 40 30
This example is from ARCHER, where nodes have 24 cores.
The 3rd codimension will be
24000 / (40*30) = 20.
Just choose an appropriate
$ cat templates/job.mpiifort.pbs #!/bin/bash --login #$Id: job.mpiifort.pbs 407 2017-05-16 15:23:59Z mexas $ # # mpiifort on PBS # # submit with e.g. # qsub -q testq -j oe -l walltime=00:10:00,nodes=2:ppn=16 job # Load the Intel module module add languages/intel-compiler-16-u2 module list # Set some helpful vars EXE=testABW.x CACONF=xx14.conf CONFILE=nodes # Switch to current working directory cd $PBS_O_WORKDIR # Prepare the MPI machine file and calculate the number of procs cat $PBS_NODEFILE > $CONFILE NNODES=`cat $PBS_NODEFILE | wc -l` # Prepare conf file echo -genvall -genv I_MPI_DEBUG=2 -genv I_MPI_FABRICS=shm:dapl \ -machinefile ./$CONFILE -n $NNODES ./$EXE > $CACONF # Run echo "START:" `date` ./$EXE echo "END: " `date`
This PBS script is for
mpiifort (Intel Fortran
over MPI) compiler/libraries.
The executable file is
There are some Intel MPI envars, which are sourced
The comment at the top gives an example
command to submit this file to PBS.
Adjust to your environment!
You might get runtime errors such as e.g.
ERROR: cgca_sld/cgca_m3sld: allocate( array ), err stat: 4205 err message: The program was unable to request more memory space.
This particular message is from Cray. On other systems the STAT value and the error message might be different, but the problem is that you don't have enough memory to allocate further arrays. At this point you are probably wondering - How do I to predict the memory usage of a CASUP program?
No easy answer can be given, but there are 2 possible strategies leading to the lower and the upper bound estimates. The lower bound is based on the input parameters, such as the box size, the mean grain size, the spatial resolution, the number of information layers in the space array, etc. The upper bound is based on a successful completion of a CASUP program and on collecting the job memory usage stats.
First the lower bound.
The total number of CA cells can be estimated from
the CA box dimensions,
the mean grain size,
the spatial resolution (cells per mean grain),
These particular variable names are taken from
Consider this fragment:
! physical dimensions of the box, assume mm bsz0 = (/ 4.0, 3.0, 5.0 /) ! mean grain size, linear dimension, e.g. mean grain diameter, also mm dm = 1.0e-1 ! resolution res = 1.0e5
The box volume, V, is the product of 3 box dimensions, i.e.
4.0 * 3.0 * 5.0 = 60 mm^3.
The volume of the mean grain, Vd, is roughly
There are roughly V/Vm=
6.0e4 grains in the model,
or multiplying by the resolution,
there are roughly
6.0e9 cells in the model.
Then divide by the number of images, e.g. if using
a single XC40 or XC30 node with 24 cores, you probably
want to have 24 images.
So each image will have roughly
6.0e9 / 24 = 2.5e8
Then consider that cells are stored in
integer( kind=iarr ), allocatable :: space(:,:,:,:) [:,:,:]
iarr is the integer kind for microstructure
arrays, the first 3 dimensions of
spatial dimensions of the CA space "shoe-box" and the 4th
space is the number of information
layers of CA, e.g. 2 (one for microstructure, and the second
for fracture information).
iarr is using 4 bytes, then
each image will use roughly
2.5e8 * 4 * 2 = 2.0e9
bytes or 2GB.
That is a bare minimum.
If using solidification or fracture routines, which is most
likely in any real CASUP program, then you'll need to create
temp arrays of the same size as
with only a single information layer.
So for our example this adds another 1GB,
i.e. the absolute minimum total is 3GB.
However! coarray variables, at least on Cray systems, are
allocated from symmetric heap, i.e. 2G for
space, while the temp arrays are not
coarrays and allocated from standard heap memory.
On Cray you need to define envar
to specify how much memory on each PE should be dedicated
for coarray variables.
XC40 system has 128GB memory per 24-core node, i.e. about 5.3GB per PE.
export XT_SYMMETRIC_HEAP_SIZE=3g will
leave 3GB for coarray variables and 2.3GB for heap memory objects.
This is roughly what you want.
Giving too much to symmetric heap will starve you of normal heap
for temp arrays.
Giving too little to symmetric heap will not give you enough
Although there are other coarray variables in CASUP,
space is by far the biggest, so our estimates
are based on it.
The upper bound estimate depends on the tool used for
collecting job stats.
aprun, might give you stats like this:
Application 7371900 resources: utime ~19061s, stime ~176s, Rss ~2960148, inblocks ~30463, outblocks ~43999
the Resident Set Size (RSS) memory size in B
(is this correct??? Need to confirm this!)
per PE used during the simulation.
aprun man page for more details on RSS etc.
However, RSS does not include Huge Pages, so this is likely
be a gross under estimate.
Furthermore, I'm not sure if it's possible to know
how much of RSS was in symmetric memory and how much in normal heap.
Profiling is probably by far the best strategy if more accurate memory usage data is needed.
validate this page