In our intial QC of the raw fastq file we will be interested in gathering various metrics, such as the total number of reads, sequence length, or GC content. We will also want to summarise such things as base quality scores and make assessments of the contamination of the reads with adapter sequence.

FastQC is quality control tool for high throughput sequence data that is maintained by the Babraham Institute. It is free to download and use. It runs a number QC analyses on sequencing data (in various formats, not just fastq) and summarises the results in a easy to read report.

The basic command to run FastQC is simply fastqc.

Access the help page to find the basic usage an other options:

fastqc --help

The Usage is:

    fastqc seqfile1 seqfile2 .. seqfileN

    fastqc [-o output dir] [--(no)extract] [-f fastq|bam|sam]
           [-c contaminant file] seqfile1 .. seqfileN

The simplest way to use it is just to type fastqc followed by all the sequence files that you wish to QC. It will then run through as many files as you provide generaing a report for each one.

There are many additional options that you can provide to modify the behavoiour of the programme. The most common one is -o output_directory. By default the report is written to the same directory as the fastqc file, however, if you would like to gather the QC in a different directory, you can specify this using the -o flag followed by the name of the directory, e.g:

fastqc -o QC fastq/my_fastq_file.fastq.gz

In this case we wish to generate a report for the file my_fastq_file.fastq.gz, which is in the folder fastq, and to have the report written into a directory called QC.

Note that the output directory must already exist, FastQC will not create it.

Exercise

    1. Check the location of the current directory using the command pwd
    2. If the current directory is not /home/participant/Course_Materials, then navigate to the Course_Materials directory using the cd (change directory) command:
      cd ~/Course_Materials
    1. Use ls to list the contents of the directory. There should be directory called fastq
    2. Use ls to list the contents of the fastq directory:
      ls fastq
  1. Create a new directory for the QC results called QC using the mkdir command:
    mkdir QC

  2. Run fastqc on one of our samples:
    fastqc fastq/MCL1.DL.fastq.gz

  3. The previous command has written the report to the fastq directory - the default behaviour for fastqc. We want it in the QC directory.
    1. Use the rm (remove) command to delete the report:
      rm fastq/MCL1.DL_fastqc.html
    2. Also delete the associated zip file (this contains all the figures and data tables for the report)
  4. Run the FastQC again, but this time try to use the -o option to have the report written to the QC directory.

  5. Open the html report in a browser and see if you can answer these questions:
    A) What is the read length?
    B) Does the quality score vary through the read length?
    C) How is the dataโ€™s quality?