- Create concatenated trancriptome/genome reference file
cat references/Mus_musculus.GRCm38.cdna.chr14.fa.gz \ references/Mus_musculus.GRCm38.dna_sm.chr14.fa.gz \ > references/gentrome.chr14.fa.gz
- Create decoy sequence list from the genomic fasta
echo "14" > references/decoys.txt
- Use
salmon indexto create the index. You will need to provide three pieces of information:
- the Transcript fasta file - references/gentrome.chr14.fa.gz
- the decoys - references/decoys.txt
- the salmon index - a directory to write the index to, use references/salmon_index_chr14
Also add
-p 7to the command to instruct salmon to use 7 threads/cores. To find the flags for the other three pieces of information use:salmon index --helpVersion Info: This is the most recent version of salmon. Index ========== Creates a salmon index. Command Line Options: -v [ --version ] print version string -h [ --help ] produce help message -t [ --transcripts ] arg Transcript fasta file. -k [ --kmerLen ] arg (=31) The size of k-mers that should be used for the quasi index. -i [ --index ] arg salmon index. --gencode This flag will expect the input transcript ... ... ... -d [ --decoys ] arg Treat these sequences ids from the reference as the decoys that may have sequence homologous to some known transcript. for example in case of the genome, provide a list of chromosome name --- one per line
salmon index \
    -t references/gentrome.chr14.fa.gz \
    -d references/decoys.txt \
    -p 7 \
    -i references/salmon_index_chr14
- There should already be a directory called
salmon_outputin theCourse_materialsdirectory. If not, create it.
mkdir salmon_output
- Use
salmon quantto quantify the gene expression from the raw fastq. To see all the options runsalmon quant --help-reads. There are lot of possible parameters, we will need to provide the following:
- salmon index - references/salmon_index
-l A- Salmon needs to use some information about the library preparation, we could explicitly give this, but it is easy enough for Salmon to Automatically infer this from the data.- File containing the #1 mates - fastq/SRR7657883.subset_2M.sra_1.fastq.gz
- File containing the #2 mates - fastq/SRR7657883.subset_2M.sra_2.fastq.gz
- Output quantification directory - salmon_output/SRR7657883
--writeMappingssalmon_output/SRR7657883.salmon.sam- Instructs Salmon to output the read alignments in SAM format to the file salmon_output/SRR7657883.salmon.sam.
--gcBias- salmon can optionally correct for GC content bias, it is recommended to always use this- The number of threads to use - 7
salmon quant \
    -i references/salmon_index \
    -l A \
    -1 fastq/SRR7657883.subset_2M.sra_1.fastq.gz \
    -2 fastq/SRR7657883.subset_2M.sra_2.fastq.gz \
    -o salmon_output/SRR7657883 \
    --writeMappings=salmon_output/SRR7657883/SRR7657883.salmon.sam \
    --gcBias \
    -p 7
- Sort and transform your aligned SAM file into a BAM file called
SRR7657883.salmon.sorted.bam. Use the option-@ 7to use 7 cores, this vastly speeds up the compression.
samtools sort \
  -@ 7 \
  -O BAM \
  -o salmon_output/SRR7657883/SRR7657883.salmon.sorted.bam \
   salmon_output/SRR7657883/SRR7657883.salmon.sam\(\Rightarrow\) salmon_output/SRR7657883/SRR7657883.salmon.sorted.bam
- Check your bam file
samtools view salmon_output/SRR7657883/SRR7657883.salmon.sorted.bam | more