1. Using FastQC to QC raw reads

Exercise 1

    1. Check the location of the current directory using the command pwd
    2. If the current directory is not Course_Materials, then navigate to the Course_Materials directory using the cd (change directory) command:
cd ~/Course_Materials
    1. Use ls to list the contents of the directory. There should be directory called fastq
    2. Use ls to list the contents of the fastq directory:
ls fastq

SRR7657883.sra_1.fastq.gz SRR7657883.subset_2M.sra_1.fastq.gz
SRR7657883.sra_2.fastq.gz Test_adapter_contamination.gq.gz.
SRR7657883.subset_2M.sra_2.fastq.gz

You should see two fastq files called SRR7657883.sra_1.fastq.gz and SRR7657883.sra_1.fastq.gz. These are the files for read 1 and read 2 of one of the samples we will be working with.

  1. Run fastqc on one of the fastq files:
fastqc fastq/SRR7657883.sra_1.fastq.gz  

This creates two files in the fastq directory. The first is the QC report in html format and the second is a zip file containing the data summary data used to generate the report. > \(\Rightarrow\) SRR7657883.sra_1_fastqc.html
> \(\Rightarrow\) SRR7657883.sra_1_fastqc.zip

  1. Open the html report in a browser and see if you can answer these questions:
    A) What is the read length? Read length 150
    B) Does the quality score vary through the read length?
    Per base quality Yes, the first few bases and the last few bases are typically of lower quality.
    C) How is the data’s quality?
    Overall, pretty good.