CASUP > FAQ
With any questions, bug reports, feedback and any other comments please submit a ticket.
A compiler that supports Fortran 2008 coarray features is required. If the compiler also supports Fortran 2015 coarray features, particularly collectives, then a richer set of capabilities will be available from CASUP.
cgca_m2mpiio.f90
.
(The source in svn repo:
cgca_m2mpiio.f90
.)
Typically an MPI library will be installed as part of the compiler
or runtime library installation, e.g. OpenCoarrays installation will
automatically check for and install some MPI distribution, possibly
MPICH or
OpenMPI.
Intel compiler is probably best used with the Intel MPI libraries,
although it's possible to use some other MPI distribution.
cgca_m2hdf5.f90
.
(The source in svn repo:
cgca_m2hdf5.f90
.)
cgca_m2netcdf.f90
.
(The source in svn repo:
cgca_m2netcdf.f90
.)
Further software is required for some platforms.
For example,
Makefile-bc3-mpiifort-tau
(the source in svn repo:
Makefile-bc3-mpiifort-tau
)
is for building CASUP with
TAU
instrumentation, for profiling and tracing analysis.
On unix/linux use svn checkout command, which
can be abbreviated to svn co.
If are want just to use CASUP,
you can use either svn
or https
protocols.
However, if you have commit access to CASUP,
you must use the https
protocol!
The exact commands for each protocol are helpfully suggested
by
SourceForge.
You can pull the whole tree in some directory zzz
as e.g.
mkdir zzz svn co https://svn.code.sf.net/p/cgpack/code/ zzzor
mkdir zzz cd zzz svn co https://svn.code.sf.net/p/cgpack/code/ .or you can pull just a particular directory, e.g.
head
:
svn co https://svn.code.sf.net/p/cgpack/code/head .or a release:
svn co svn://svn.code.sf.net/p/cgpack/code/realeses/2.9/ .
Use an ssh tunnel. A lot of help on this is available online, so let's just look at some simple examples.
A typical example is when you want to use CASUP
on a remote system, e.g. an HPC system, which has
some restrictions on network traffic.
If you want to use svn
protocol,
then from your terminal you have to establish an ssh
tunnel to the remote host, e.g.
ssh -R12345:svn.code.sf.net:3690 <username>@<system>
where 3690
is the standard subversion port,
and 12345
is an arbitrarily chosen free port.
This command should log you into the remote host
and establish the tunnel.
Then from the remote host do e.g.
svn co svn://localhost:12345/p/cgpack/code/ .
where the port number must match that given in the ssh command.
If you need to use https
protocol,
then need to use the standard https port 443:
ssh -R12345:svn.code.sf.net:443 <username>@<system>
Then from the remote host do e.g.
svn co https://localhost:12345/p/cgpack/code/ .
We are working on autotools
based build system,
but this hasn't been completed yet.
In the meantime, there is a collection of Makefile
s
for different platforms:
Makefile-archer Makefile-bc Makefile-bc3-ifort-shared Makefile-bc3-mpiifort Makefile-bc3-mpiifort-tau Makefile-bc3-oca Makefile-FreeBSD Makefile-mpiifort-scorep
If there is a Makefile
for your exact platform, then just use it,
e.g. on FreeBSD with
OpenCoarrays
installed:
make -f Makefile-FreeBSD
The build will produce a number of object files, *.o
,
the unix library archive libcgpack.a
, and, on
some platforms, also module and submodule files, *.mod
and *.smod
.
Refer to the
online documentation
on the provided Makefile
s for more details.
This error is from the GCC Fortran compiler, gfortran,
which prior to version 7 did not support
coarrays of derived type with allocatable or pointer
components.
The obvious solution is upgrade to GCC7.
If this is not possible, a smaller subset of CASUP
library routines can be built by excluding the modules
and the submodules which throw this error.
At present these are module
cgca_m3pfem
,
source code:
cgca_m3pfem.f90
,
and submodule
m3pfem_sm1
,
source code:
m3pfem_sm1.f90
.
You'd also need to comment out cgca_m3pfem
module
from the top level module cgca
:
!use cgca_m3pfem
Yes. Look at these tests:
ABM
, source code:
testABM.f90
ABN
, source code:
testABN.f90
ABO
, source code:
testABO.f90
ABV
, source code:
testABV.f90
ABW
, source code:
testABW.f90
ACD
, source code:
testACD.f90
ACF
, source code:
testACF.f90
You'll see calls to CASUP routines which in turn include calls to
MPI/IO, or NetCDF or HDF5 routines, which call MPI/IO.
For example
cgca_pswci
( source:
m2out_sm2_mpi.f90
)
uses raw MPI/IO.
This CASUP routine is called in tests
ABM
,
ABN
,
ABO
,
ABV
,
ABW
.
ACD
.
CASUP routine
ccgca_pswci2
uses MPI/IO.
Routine
cgca_pswci3
uses NetCDF.
Routine
cgca_pswci4
uses HDF5.
All 3 routines are used in test
ACF
.
The 3 compilers,
Cray, Intel and OpenCoarrays seem, to work with coarray/MPI code, although
Intel does not support this.
Some tests, e.g. testABK,
testABK.f90
,
(the source in svn repo:
testABK.f90
)
use
CrayPAT
(the Cray Performance Analysis Tool)
API calls,
such as
PAT_region_begin or PAT_region_end.
Cleary such tests can be built only on Cray platforms.
To link such tests into executables,
Cray module
perftools
must first be loaded.
Otherwise you see errors such as these on linking:
ftn testABK.o -o testABK.x testaux.o -L<path to your CASUP library> -lcgpack cgpack/head/tests/testABK.f90:8: undefined reference to `pat_region_begin_' cgpack/head/tests/testABK.f90:144: undefined reference to `pat_region_end_' cgpack/head/tests/testABK.f90:8: undefined reference to `pat_region_begin_' cgpack/head/tests/testABK.f90:158: undefined reference to `pat_region_end_'
After perftools
module has been loaded, the linking should proceed fine:
uzi@eslogin001:~/cgpack/head/tests> module load perftools uzi@eslogin001:~/cgpack/head/tests> make -f Makefile-archer ftn testABK.o -o testABK.x testaux.o -L/home/ecse0505/ecse0505/uzi/lib -lcgpack uzi@eslogin001:~/cgpack/head/tests>
Tests which call
getcodim
subroutine
from module
testaux.f90
take 2 integer arguments from command line.
At present these are tests from AAA to ABB, from ABD to ABL,
from ABN to ABS.
For example test AAF can be run
with the GCC/OpenCoarrays cafrun
launcher
with command line arguments 2 and 2:
cafrun -np 16 ./testAAF.x 2 2
These arguments are codimensions 1 and 2.
Codimension 3 is calculated as the total number
of images divided by codim1 and by codim2.
If codimension 3 is not a positive integer,
the program aborts.
In this example the 3rd codimension is
16 / (2*2) = 4
.
With the Intel compiler, when running on a single node, i.e. with only shared memory, can launch simply as:
./testAAF.x 2 2
Other tests do not need command line arguments. If you want to use the Intel compiler with distributed memory, first you need to compile in a special way, and to create a coarray config file. The command line arguments are written to that config file, and you can launch the test simply as:
./testAAF.x
On Cray use aprun
with the usual job options,
such as
-n
,
-N
,
-S
,
-d
,
-T
etc. and by giving the codimensions 1 and 2 on the command line, e.g.:
#PBS -l select=1000 aprun -n 24000 -N 24 -S 12 -d 1 -T ./testAAF.x 40 30
This example is from ARCHER, where nodes have 24 cores.
The 3rd codimension will be 24000 / (40*30) = 20
.
Just choose an appropriate job.*.pbs
file from tests/templates
directory,
e.g. job.mpiifort.pbs
:
$ cat templates/job.mpiifort.pbs #!/bin/bash --login #$Id: job.mpiifort.pbs 407 2017-05-16 15:23:59Z mexas $ # # mpiifort on PBS # # submit with e.g. # qsub -q testq -j oe -l walltime=00:10:00,nodes=2:ppn=16 job # Load the Intel module module add languages/intel-compiler-16-u2 module list # Set some helpful vars EXE=testABW.x CACONF=xx14.conf CONFILE=nodes # Switch to current working directory cd $PBS_O_WORKDIR # Prepare the MPI machine file and calculate the number of procs cat $PBS_NODEFILE > $CONFILE NNODES=`cat $PBS_NODEFILE | wc -l` # Prepare conf file echo -genvall -genv I_MPI_DEBUG=2 -genv I_MPI_FABRICS=shm:dapl \ -machinefile ./$CONFILE -n $NNODES ./$EXE > $CACONF # Run echo "START:" `date` ./$EXE echo "END: " `date`
This PBS script is for mpiifort
(Intel Fortran
over MPI) compiler/libraries.
The executable file is testABW.x
.
There are some Intel MPI envars, which are sourced
with -genv
.
The comment at the top gives an example qsub
command to submit this file to PBS.
Adjust to your environment!
You might get runtime errors such as e.g.
ERROR: cgca_sld/cgca_m3sld: allocate( array ), err stat: 4205 err message: The program was unable to request more memory space.
This particular message is from Cray. On other systems the STAT value and the error message might be different, but the problem is that you don't have enough memory to allocate further arrays. At this point you are probably wondering - How do I to predict the memory usage of a CASUP program?
No easy answer can be given, but there are 2 possible strategies leading to the lower and the upper bound estimates. The lower bound is based on the input parameters, such as the box size, the mean grain size, the spatial resolution, the number of information layers in the space array, etc. The upper bound is based on a successful completion of a CASUP program and on collecting the job memory usage stats.
First the lower bound.
The total number of CA cells can be estimated from
the CA box dimensions, bsz0
,
the mean grain size, dm
, and
the spatial resolution (cells per mean grain),
res
.
These particular variable names are taken from
test
ABV
,
source code:
testABV.f90
.
Consider this fragment:
! physical dimensions of the box, assume mm bsz0 = (/ 4.0, 3.0, 5.0 /) ! mean grain size, linear dimension, e.g. mean grain diameter, also mm dm = 1.0e-1 ! resolution res = 1.0e5
The box volume, V, is the product of 3 box dimensions, i.e.
V=4.0 * 3.0 * 5.0 = 60
mm^3.
The volume of the mean grain, Vd, is roughly dm**3
,
or Vm=1.0e-3
.
There are roughly V/Vm=6.0e4
grains in the model,
or multiplying by the resolution, res
,
there are roughly 6.0e9
cells in the model.
Then divide by the number of images, e.g. if using
a single XC40 or XC30 node with 24 cores, you probably
want to have 24 images.
So each image will have roughly 6.0e9 / 24 = 2.5e8
cells.
Then consider that cells are stored in space
array:
integer( kind=iarr ), allocatable :: space(:,:,:,:) [:,:,:]
where iarr
is the integer kind for microstructure
arrays, the first 3 dimensions of space
are
spatial dimensions of the CA space "shoe-box" and the 4th
dimension of space
is the number of information
layers of CA, e.g. 2 (one for microstructure, and the second
for fracture information).
If iarr
is using 4 bytes, then space
on
each image will use roughly 2.5e8 * 4 * 2 = 2.0e9
bytes or 2GB.
That is a bare minimum.
If using solidification or fracture routines, which is most
likely in any real CASUP program, then you'll need to create
temp arrays of the same size as space
although
with only a single information layer.
So for our example this adds another 1GB,
i.e. the absolute minimum total is 3GB.
However! coarray variables, at least on Cray systems, are
allocated from symmetric heap, i.e. 2G for
space
, while the temp arrays are not
coarrays and allocated from standard heap memory.
On Cray you need to define envar XT_SYMMETRIC_HEAP_SIZE
to specify how much memory on each PE should be dedicated
for coarray variables.
An example:
Hazel Hen
XC40 system has 128GB memory per 24-core node, i.e. about 5.3GB per PE.
So setting export XT_SYMMETRIC_HEAP_SIZE=3g
will
leave 3GB for coarray variables and 2.3GB for heap memory objects.
This is roughly what you want.
Giving too much to symmetric heap will starve you of normal heap
for temp arrays.
Giving too little to symmetric heap will not give you enough
memory for space
.
Although there are other coarray variables in CASUP,
space
is by far the biggest, so our estimates
are based on it.
The upper bound estimate depends on the tool used for
collecting job stats.
E.g. aprun
, might give you stats like this:
Application 7371900 resources: utime ~19061s, stime ~176s, Rss ~2960148, inblocks ~30463, outblocks ~43999
where Rss
is
the Resident Set Size (RSS) memory size in B
(is this correct??? Need to confirm this!)
per PE used during the simulation.
Check the aprun
man page for more details on RSS etc.
However, RSS does not include Huge Pages, so this is likely
be a gross under estimate.
Furthermore, I'm not sure if it's possible to know
how much of RSS was in symmetric memory and how much in normal heap.
Profiling is probably by far the best strategy if more accurate memory usage data is needed.