- Check the location of the current directory using the command
pwd
- If the current directory is not
Course_Materials, then navigate to the Course_Materials directory using thecd(change directory) command:cd ~/Course_Materials/RNAseq
- Use
lsto list the contents of the directory. There should be directory called fastq
- Use
lsto list the contents of the fastq directory:ls fastqSRR7657883.sra_1.fastq.gz SRR7657883.sra_2.fastq.gz
You should see two fastq files. These are the files for read 1 and read 2 of one of the samples we will be working with.
- Create a new directory for the QC results called QC using the
mkdircommand:mkdir QC\(\Rightarrow\) QC
- Run fastqc on one of the fastq files:
fastqc fastq/SRR7657883.sra_1.fastq.gz\(\Rightarrow\) SRR7657883.sra_1_fastqc.html
\(\Rightarrow\) SRR7657883.sra_1_fastqc.zip
- The previous command has written the report to the fastq directory - the default behaviour for fastqc. We want it in the QC directory.
- Use the
rm(remove) command to delete the report:rm SRR7657883.sra_1_fastqc.html
- Also delete the associated zip file (this contains all the figures and the data tables for the report)
rm -f fastq/SRR7657883.sra_1_fastqc.zip
- Run the FastQC again, but this time:
- have FastQC analyse both fastq files at the same time. You will need to add
-t 2before the sequence file names. Seefastqc --helpto find out about this option.- try to use the
-ooption to have the reports written to the QC directory.
fastqc -t 2 -o QC fastq/SRR7657883.sra_1.fastq.gz fastq/SRR7657883.sra_2.fastq.gz
or more simply we can use the * wild card:
fastqc -t 2 -o QC fastq/SRR7657883.sra_*.fastq.gz
\(\Rightarrow\) QC/SRR7657883.sra_1_fastqc.html
\(\Rightarrow\) QC/SRR7657883.sra_1_fastqc.zip
\(\Rightarrow\) QC/SRR7657883.sra_2_fastqc.html
\(\Rightarrow\) QC/SRR7657883.sra_2_fastqc.zip
- Open the html report in a browser and see if you can answer these questions:
A) What is the read length? 150
B) Does the quality score vary through the read length?
Yes, the first few bases and the last few bases are typically of lower quality.
C) How is the dataโs quality?
Overall, pretty good.