CGPACK > JAN-2013
15-JAN-2013: Few simple observations.
At this time it seems the Cray fortran compiler is the most advanced. Intel compiler allows us to run only shared memory programs. The distributed version of Intel compiler is available but we haven't got it. It might be available on phase 3, possibly Spring 2013. Right now we have ifort 12.1.2 20111128 on phase 1 and ifort 12.0.2 20110112 on phase 2. I know of no other compiler ready to use with Fortran coarrays. Let us know if you know of any (mexas@bris.ac.uk).
For more details on Intel coarray support see: Using Coarray Fortran and Distributed Memory Coarray Fortran with the Intel Fortran Compiler for Linux: Essential Guide.
Intel mpdboot and mpdallexit are not needed for shared memory execution.
A simple example from phase 1. The code:
babyblue2% cat coarray1.f90 real :: z[*] if (this_image()==1) write (*,*) "from image 1: there are", num_images(), "images" sync all z=this_image() write (*,'(a,i0,a,f0.0)') 'image ', this_image(), ', value: ',z end babyblue2%
Compiling:
babyblue2% ifort -coarray=shared -coarray-config-file=coarr.conf coarray1.f90 babyblue2%
the -coarray option is required by the compiler. Without this option the coarray language elements cannot be processed (which is stupid since coarrays are a standard feature and must be supported by default).
-coarray-config-file is optional, but useful. It allows to link a coarray config file into the executable, so that the run-time options can be changed by simply changing the config file and not needed recompilation. In this simple example:
babyblue2% cat coarr.conf -envall -n 16 ./a.out babyblue2%
where -enval "copies your current environment variables to the environment of your CAF processes. HIGHLY RECOMMENDED." (from Intel manual).
-n specifies how many images will be created. This is required.
Finally the name of the executable is given. This obviously must match the name given at the complilation stage.
Anyway, that's it:
babyblue2% ./a.out from image 1: there are 16 images image 2, value: 2. image 5, value: 5. image 1, value: 1. image 4, value: 4. image 13, value: 13. image 6, value: 6. image 12, value: 12. image 3, value: 3. image 7, value: 7. image 8, value: 8. image 16, value: 16. image 9, value: 9. image 14, value: 14. image 10, value: 10. image 15, value: 15. image 11, value: 11. babyblue2%
or can submit to queue as usual:
babyblue2% cat z.sh #!/bin/sh #PBS -l walltime=00:01:00,nodes=1:ppn=1 #PBS -j oe #PBS -m abe cd $HOME ./a.out babyblue2% babyblue2% qsub -qveryshort z.sh 263042.bluecrystal1.cm.cluster babyblue2% qstat -u $USER bluecrystal1.cm.cluster: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - ----- 263042.bluecryst mexas veryshor z.sh 407 1 1 -- 00:01 R -- babyblue2%
and when the job is finished:
babyblue2% cat z.sh.o263042 Warning: no access to tty (Bad file descriptor). Thus no job control in this shell. from image 1: there are 16 images image 1, value: 1. image 9, value: 9. image 14, value: 14. image 10, value: 10. image 6, value: 6. image 2, value: 2. image 13, value: 13. image 7, value: 7. image 4, value: 4. image 3, value: 3. image 15, value: 15. image 5, value: 5. image 11, value: 11. image 12, value: 12. image 8, value: 8. image 16, value: 16. babyblue2%
The config file way of doing things (-coarray-config-file) is one option. It is better (in my opinion) to use the env variable FOR_COARRAY_NUM_IMAGES to specify the number of runtime images. For this the -coarray-config-file should not be used at compilation, or at least -n option should not be specified in the config file. For example:
bigblue2> ifort -coarray=shared try1.f90 bigblue2>
Then this can be run with e.g. 8 images:
bigblue2> cat bc.sh #!/bin/sh #PBS -l walltime=00:01:00,nodes=1:ppn=1 #PBS -j oe #PBS -m abe export FOR_COARRAY_NUM_IMAGES=8 cd $HOME/nobackup/cgpack/branches/coarray ./a.out bigblue2> bigblue2> cat bc.sh.o1746803 Warning: no access to tty (Bad file descriptor). Thus no job control in this shell. from image 1: there are 8 images image 4, value: 4. image 2, value: 2. image 6, value: 6. image 7, value: 7. image 8, value: 8. image 3, value: 3. image 1, value: 1. image 5, value: 5. bigblue2>
or changed to e.g. 10 images:
bigblue2> cat bc.sh #!/bin/sh #PBS -l walltime=00:01:00,nodes=1:ppn=1 #PBS -j oe #PBS -m abe export FOR_COARRAY_NUM_IMAGES=10 cd $HOME/nobackup/cgpack/branches/coarray ./a.out bigblue2> bigblue2> cat bc.sh.o1746804 Warning: no access to tty (Bad file descriptor). Thus no job control in this shell. from image 1: there are 10 images image 1, value: 1. image 2, value: 2. image 10, value: 10. image 6, value: 6. image 4, value: 4. image 8, value: 8. image 9, value: 9. image 7, value: 7. image 5, value: 5. image 3, value: 3. bigblue2>
23-JAN-2013: Some timings.
This example is from EPCC coarray training course. An edge image is given and the program reconstructs the original image by double integration. The image size is 672 x 1024 pixels. The integration is carried out for a fixed 100k iterations. No convergence check is made. The results are shown below.
These results are from BlueCrystal phase 2. Our Intel license does not allow distributed memory for coarrays, so these results are from a single node with 8 cores (shared memory). The time is the wallclock time (elapsed or real time) Things to note:
31-JAN-2013: making a large array of coarrays.
The idea is to have a coarray declared as
allocatable :: coarray(:,:,:)[:,:,:]
then allocate according to needs, e.g.:
allocate(coarray(dim1,dim2,dim3)[codim1,codim2,*])
Here dim1, dim2, dim3 are coarray dimensions on each image, and codim1 and codim2 are array codimensions, i.e. dimensions 1 and 2 of the grid of images.
The following example was run on 8 images with
allocate(coarray(10,10,10)[2,2,*])
which means the third (final) codimension is also 2. On each image an array of 10*10*10=1000 elements is then created. My logic is then to think of those arrays arranged in a 2x2x2 grid. The dimensions of this superarray are then 20*20*20=8000 elements.
The image below was obtained with paraview. Each image assigns this_image() to all elements of its local coarray. Then a routine is called from image 1 that writes the coarrays from all images in order. The resulting binary file describes the super array.
Note from the image that the super array is arranged in the fortran natural "array element order. It is obtained by counting most rapidly in the early dimensions" (from Metcalf et al (2011) Modern Fortran Explained, Oxford). In the image "X" corresponds to dimension 1 of the array, "Y" corresponds to dimension 2 of the array, and "Z" corresponds to dimension 3 of the array. One can trace this logic by following the colours and the values, which are simply image numbers. Array from image 1 is completely hidden from view.