Batch Cluster for Dummies

If you are new to Clusters and everything related with this, here is a quick guide that

Before anything else

You will have to ask your favorite administrator to give you a directory on lc3 or lc4, and the permission to run on the cluster. This will take roughly one day. In the meantime have a look at the Sun N1 Grid Engine pages, and adjust your Z Shell environment.

Getting started

Login to lc3 or lc4 and set the proper environment

$ ssh lc4
$ cd myWorkingDirectory
$ . /usr/sge/default/common/settings.sh

Write a shell script that is actually doing the job. If you need to run something like Marlin, or Mokka with many very similar steering files, you can automatically generate them with another shell script.

Running a script

Now you can first try to run the shell script from the main Batch Cluster page.

$ qsub job-test.sh

Check what is going on:

$ qstat
$ qmon

The first command will print the current status of your job in your shell, while qmon opens a GUI. Click on “Job Control” to check waiting, running and finished jobs on the cluster. You can also submit jobs via this GUI, just click on “Submit Jobs”.

The standard directory for the output, error and result file is your home directory, which most probably is your AFS directory. In case you do not have enough space on the AFS (I doubt that anyone has), you better pipe out everything somewhere else. In this case we have to modify the job-test.sh a little bit:

# job-test.sh

#$ -l arch=x86
#$ -l distro=sld4
#$ -e /nfs/flc/lc3/pool/myWorkingDirectory/stdError.txt
#$ -o /nfs/flc/lc3/pool/myWorkingDirectory/stdOutput.txt

echo "job test" > /nfs/flc/lc3/pool/myWorkingDirectory/job-test.log

/!\ You can give the full path, including /nfs/flc/lc3/ to be sure the cluster knows where to find you. Alternatively, have a look at the section “Using path aliases” in the Local SGE Documentation and see how to avoid problems if your working directory is located on one of the /pool disks of lc3 or lc4. You can then use the -cwd command line option (or a special comment of the form #$ -cwd in your script) to write your standard output and standard error to the current working directory.

Queues - OUTDATED

Read: http://www-it/systems/services/batch/sge/index.html

Depending on the time your job will roughly take, you have to adjust the queue your lining in. The default one only allows for 5min. jobs, afterwards your job gets killed. so if you want to work, you will most probably have to line in another queue. The most common queues are

name

time limit

comment

default.q

5 min

default without project/group

short.q

1 day

available, default (h_rt < 24:00:00)

long.q

1 week

available, long runner (24:00:00 < h_rt < 168:00:00)

and can be addressed by one more command in your shell script

#$ -q short.q

BatchClusterForDummies (last edited 2009-06-16 18:32:14 by localhost)