Personal tools
You are here: Home UiT Stallo Documentation Stallo User Guide 20 Batch job submission
Document Actions

20 Batch job submission

Up to Table of Contents

1   Create a job

To run a job on the system one needs to create a job script. A job script is a regular shell script (bash or csh) with some directives specifying number of cpus, memory etc. that will be interpreted by the batch system upon submission. See here for a complete job script example with comments.

2   Choosing network

The Stallo cluster has two types of networks, infiniband and gigabit ethernet. If your application needs a fast network you should use infiniband. The selection of network type is done by inserting the appriate job parameters into the job script or the command line. For instance:

infiniband:qsub -lnodes=64:ib .........
ethernet:qsub -lnodes=64:gige .......

if you do not specify any network you will get whatever becomes available first. To check if your application needs a fast interconnect you should try to run the same job on both networks to see if the runtime differs significantly.

3   Manage a job

A job's lifecycle can be managed with as little as three different commands

  1. Submit the job with qsub jobscript.sh.
  2. Check the job status with showq. (to limit the display to only your jobs use showq -u username.)
  3. (optional) Delete the job with qdel jobid.

4   List of useful commands

4.1   Torque commands

See the man page for each command for details.

qsub:Submit jobs. All job parameters can be specified on the command line or in the job script. Command line arguments take precedence over directives in the script.
qstat:Show jobs in the queue. Jobs will be sorted by submit order.
qdel:Delete a job. Use qdel all to terminate all your jobs immediately.

4.2   Maui commands

For details run the command with the -h option.

showq:List jobs in the queue. Jobs will be sorted by time to completion. To only see jobs for a specific user user -u username.
checkjob:Show details about a specific job.
checknode:Show details about the state of a specific compute node.

6   List of classes/queues, incl. short description and limitations

In general it is not neccessary to specify a specific queue for your job, the batch system will route your job to the right queue automatically based on your job parameters. There are two exceptions to this, the express and the highmem queue

express:Jobs will get higher priority than jobs in other queues. Submit with qsub -q express .... Limits: Max walltime is 8 hours, no other resource limits, but there are very strict limits on the number of jobs running etc. (Details)
highmem:Jobs will get access to the nodes with large memory (32GB). Submit with qsub -q highmem .... Limits: Restricted access, send a request to support to get access to this queue. Jobs will be restricted to the 50 nodes with 32GB memory.

Other queues

batch:The default queue. Routes jobs to the queues below.
short:Jobs in this queue is allowed to run on any nodes, also the highmem nodes. Limits: walltime < 48 hours.
singlenode:Jobs that will run within one compute node will end up in this queue. Limits: Only access to nodes without infiniband.
multinode:Contains jobs that span multiple nodes. Limits: None, users can specify if they want infiniband or ethernet nodes.

Again, it is not neccessary to ask for any specific queue unless you want to use express or highmem.

7   Relevant examples (also for beginning users)

In addition to the generic jobscript example there are application specific examples on the documentation for specific applications.

8   Creating dependencies between jobs

See the description of the -Wdepend option in the qsub manpage.

9   Combining multiple tasks in a single job

By using some shell trickery one can spawn and load-balance multiple independent task running in parallel within one node, just background the tasks and poll to see when some task is finished until you spawn the next:

for t in $tasks; do
  ./dowork.sh $t &
  activetasks=$(jobs | wc -l)
  while [ $activetasks -ge $maxpartasks ]; do
    sleep 1
    activetasks=$(jobs | wc -l)
  done
done
wait

Complete examples with descriptive comments can be found here: partasks.sh, dowork.sh.

by Roy Einar Dragseth last modified Aug 26, 2010 11:59 AM Notur