BC3203

How to deal with long running commands

Most bioinformatic analyses on large datasets will involve at least one step that takes a long time to run. Long running commands pose special problems. In particular we would like to;

Unfortunately if you simply run a long running command using the run button on an RMarkdown document you won’t achieve any of the goals listed above.

Unless your command runs very quickly you will need to package it up into a slurm script and run it using the slurm queing system on the RStudio server.

Here’s how to do it.

Running Jobs with SLURM

Let’s assume we want to run the following command using a slurm script

echo "Hello SLURM"

First create a file for your script. You can do this within rstudio by choosing File -> New File -> Shell Script. Give your file a name that reflects the purpose of your script. In this case hello.sh would be appropriate.

Now open your script file using the rstudio code editor (click the file in the Files browser) and enter the following text

#!/bin/bash
#SBATCH --time=60
#SBATCH --ntasks=2 --mem=4gb

echo "Hello SLURM"

Now you are ready to run your script. Open a Terminal window and enter the following command

sbatch hello.sh

You should see a response something like

Submitted batch job XX

Where XX is a number. This is your job number. You should also see a file appear in your project called slurm-XX.out (again where XX is your job number).

You can check progress of your job in a couple of ways.

  1. Take a look at output in slurm-XX.out
    tail slurm-XX.out
    
  2. Look at where your job is in the queue
    squeue
    

If your job is finished it will disappear from the queue, so one way to definitively check for job completion is to run squeue and check to see if your job is there. Note that during busy times there might be several jobs from other users in the queue. You should be able to see tell which job is yours because it will be marked with your user name.

Troubleshooting slurm scripts

Sometimes your slurm script will not run, or it might crash before it is finished. Some common reasons are;

  1. Your job crashed or didn’t finish because it ran out of memory. In the top part of your slurm script you need to indicate how much memory and how many CPUs your job will need.
    #SBATCH --ntasks=2 --mem=4gb
    

    Here ntasks should be set to the number of CPUs that the job will use. If you are in doubt leave this value at 2. At some points in the guide you will be instructed to set this to a certain value. mem indicates the amount of memory required. Again, follow the guide here. If you need more than 4gb of memory the guide will tell you what is needed.

  2. Your script didn’t run because a program wasn’t available. Usually this won’t be a problem because all programs on our server are centrally installed. There are a few exceptions though. One exception is qiime which is not a normal command, but an alias. If you enter qiime within a slurm script you might see an error like this
    slurm_script: line 5: qiime: command not found
    

    This can be fixed by adding the alias into your script. So for qiime commands you need to add the following code in your script before the qiime command

    shopt -s expand_aliases
    alias qiime=`apptainer run -B /pvol/:/pvol /pvol/data/sif/qiime.sif qiime`