crukci-cluster-transition

CRUKCI Cluster Transition - Hands-on training

View the Project on GitHub

Session 3: Usage of the cluster

Learning Objectives

Submit a job

We did submit a very simple job to the cluster using the echo command and a shell script job.sh containing specific Slurm instructions. Please check the instructions on Can I submit jobs onto the cluster? and run it again to make sure you can submit jobs to the cluster and understand how to do it.

In the job.sh script, we have specific SBATCH instructions:

See sbatch man page for all the options and explanation on submitting a batch script to Slurm.

:computer: EXERCISE Go to your Terminal window, or open a new one and go to session1-data/nelle/.

  • Copy molecules/ onto your scratch space on the cluster using scp -r
  • Submit middle.sh script to Slurm to extract lines 20-23 of octane.pdb
  • Write a loop to submit jobs for all PDB files by modifying job.sh to take one command line argument which will be given at each step of the loop

:tada: Congratulations! :thumbsup: You did it! :wink:

Get cluster file systems mounted on your macOS

FUSE for macOS allows you to extend macOS’s native file handling capabilities via third-party file systems. Combining with SSHFS, it gets file system accessible via ssh mounted directly on your macOS computer.

Install:

Open a Terminal window, and create a mount point in your home directory mnt/scratchb for example:

cd ~                    # go to home directory
mkdir -p mnt/scratchb   # create intermediate directories as required with -p option
cd mnt/scratchb         # go to this directory
pwd                     # return current working directory name

Mount your cluster /mnt/scratchb/my_group/my_username onto your local machine using:

sshfs my_username@clust1-headnode.cri.camres.org:/mnt/scratchb/my_group/my_username /Users/my_username/mnt/scratchb

To list all mounted file system, use mount:

mount

To unmount scratchb, use:

umount /Users/my_username/mnt/scratchb

Aliases can be created in your ~/.profile to save you time and to avoid typing complex commands every time you need them. You could enter these lines at the beginning of your ~/.profile file using your preferred editor atom which could be launch from the command line by typing atom:

atom ~/.profile

Add enter these lines into the file:

### Aliases
alias mntclustsb='sshfs pajon01@clust1-headnode.cri.camres.org:/mnt/scratchb/my_group/my_username /Users/my_username/mnt/scratchb'
alias umntclustsb='umount /Users/my_username/mnt/scratchb'

Save and open a new Terminal window, your new aliases are now available as ‘new’ commands! :thumbsup:

FastQ quality control

We are now going back to our own data and check read quality using FastQC. Here we will use the command line version of FastQC to check what kind of options this tool has. This command will display all the parameters you can use when running FastQC.

/home/bioinformatics/software/fastqc/fastqc-v0.11.5/fastqc --help

Now to run FastQC on your downloaded sequencing data in SLX-ID folder, you will need to run this command but we are not running this command directly, we will submit this job to the cluster:

/home/bioinformatics/software/fastqc/fastqc-v0.11.5/fastqc -o /scratcha/xxlab/my_username/SLX-ID/ --noextract -f fastq my_file.fastq.gz

If you need to get sequencing data again, check the steps from Session 1: Shell - Getting sequencing: Using CRUKCI infrastructure data

The options are:

:computer: EXERCISE Go to your Terminal window, or open a new one and log in onto the cluster head node.

  • Navigate to your project data.
  • Create a job.sh to run FastQC
  • Send job to the cluster and wait for the results
  • Update your README.txt file with what you’ve done
  • View the html report in a web browser, you may have to copy back this file on your own computer to be able to view it using the scp command or mount your scratch space using sshfs

:tada: Congratulations! :thumbsup: You did it! :wink:

When running jobs on the cluster, you may wish to keep an eye on the output by using tail -f which appends data as the file grows:

tail -f my_output_file_name.out

Check your job is still running and in the queue using squeue, you can combine it with grep my_username to only extract information about your jobs:

squeue | grep my_username

You can display information of all your submitted jobs using sacct but most importantly you need to check that your job has completed using:

sacct -j JobID

If you wish to kill your job before it completes, run scancel using:

scancel JobID

:computer: EXERCISE Go to your Terminal window, or open a new one and log in onto the cluster head node.

  • Navigate to your project data.
  • Run FastQC for all your sequencing files in your project by sending a job per file to the cluster
  • Wait for the results
  • Update your README.txt file with what you’ve done
  • View the html report in a web browser, you may have to copy back this file on your own computer to be able to view it using the scp command or mount your scratch space using sshfs

:tada: Congratulations! :thumbsup: You did it! :wink:

Take home message: everyday cluster commands

sbatch job.sh
sacct -j JobID
tail -f my_output_file_name.JobID.out

Reference materials