Document Actions
20 Batch job submission
Up to Table of Contents
- 1 Create a job
- 2 Choosing network
- 3 Manage a job
- 4 List of useful commands
- 5 List of useful job script parameters
- 6 List of classes/queues, incl. short description and limitations
- 7 Relevant examples (also for beginning users)
- 8 Creating dependencies between jobs
- 9 Combining multiple tasks in a single job
1 Create a job
To run a job on the system one needs to create a job script. A job script is a regular shell script (bash or csh) with some directives specifying number of cpus, memory etc. that will be interpreted by the batch system upon submission. See here for a complete job script example with comments.
2 Choosing network
The Stallo cluster has two types of networks, infiniband and gigabit ethernet. If your application needs a fast network you should use infiniband. The selection of network type is done by inserting the appriate job parameters into the job script or the command line. For instance:
| infiniband: | qsub -lnodes=64:ib ......... |
|---|---|
| ethernet: | qsub -lnodes=64:gige ....... |
if you do not specify any network you will get whatever becomes available first. To check if your application needs a fast interconnect you should try to run the same job on both networks to see if the runtime differs significantly.
3 Manage a job
A job's lifecycle can be managed with as little as three different commands
- Submit the job with qsub jobscript.sh.
- Check the job status with showq. (to limit the display to only your jobs use showq -u username.)
- (optional) Delete the job with qdel jobid.
4 List of useful commands
4.1 Torque commands
See the man page for each command for details.
| qsub: | Submit jobs. All job parameters can be specified on the command line or in the job script. Command line arguments take precedence over directives in the script. |
|---|---|
| qstat: | Show jobs in the queue. Jobs will be sorted by submit order. |
| qdel: | Delete a job. Use qdel all to terminate all your jobs immediately. |
4.2 Maui commands
For details run the command with the -h option.
| showq: | List jobs in the queue. Jobs will be sorted by time to completion. To only see jobs for a specific user user -u username. |
|---|---|
| checkjob: | Show details about a specific job. |
| checknode: | Show details about the state of a specific compute node. |
6 List of classes/queues, incl. short description and limitations
In general it is not neccessary to specify a specific queue for your job, the batch system will route your job to the right queue automatically based on your job parameters. There are two exceptions to this, the express and the highmem queue
| express: | Jobs will get higher priority than jobs in other queues. Submit with qsub -q express .... Limits: Max walltime is 8 hours, no other resource limits, but there are very strict limits on the number of jobs running etc. (Details) |
|---|
| highmem: | Jobs will get access to the nodes with large memory (32GB). Submit with qsub -q highmem .... Limits: Restricted access, send a request to support to get access to this queue. Jobs will be restricted to the 50 nodes with 32GB memory. |
|---|
Other queues
| batch: | The default queue. Routes jobs to the queues below. |
|---|---|
| short: | Jobs in this queue is allowed to run on any nodes, also the highmem nodes. Limits: walltime < 48 hours. |
| singlenode: | Jobs that will run within one compute node will end up in this queue. Limits: Only access to nodes without infiniband. |
| multinode: | Contains jobs that span multiple nodes. Limits: None, users can specify if they want infiniband or ethernet nodes. |
Again, it is not neccessary to ask for any specific queue unless you want to use express or highmem.
7 Relevant examples (also for beginning users)
In addition to the generic jobscript example there are application specific examples on the documentation for specific applications.
8 Creating dependencies between jobs
See the description of the -Wdepend option in the qsub manpage.
9 Combining multiple tasks in a single job
By using some shell trickery one can spawn and load-balance multiple independent task running in parallel within one node, just background the tasks and poll to see when some task is finished until you spawn the next:
for t in $tasks; do
./dowork.sh $t &
activetasks=$(jobs | wc -l)
while [ $activetasks -ge $maxpartasks ]; do
sleep 1
activetasks=$(jobs | wc -l)
done
done
wait
Complete examples with descriptive comments can be found here: partasks.sh, dowork.sh.

