Contents

0.1 Peak Analysis

0.1.1 Set working directory

bash

cd /home/participant/Course_Materials/ChIPSeq/Materials/

#mkdir motifs if it doesn't exist

cd motifs

0.1.2 Extract region around peak summit

R

# read in the peak summit file
 
setwd("/home/participant/Course_Materials/ChIPSeq/Materials/motifs")
peakfile <-read.table("/home/participant/Course_Materials/ChIPSeq/Materials/Peaks/TAp73beta_r2.fastq_trimmed.fastq_sorted_filterdup.bed")


# get a 400bp window around summit (-200 and +200)
Motif_Peaks <- data.frame((peakfile$V1), (peakfile$V2-250), (peakfile$V2+250), stringsAsFactors = FALSE)

# create bed file
options(scipen=999)

write.table(Motif_Peaks, file= "/home/participant/Course_Materials/ChIPSeq/Materials/motifs/Peaks_for_motif_detection.bed", 
            row.names=FALSE,col.names = FALSE, sep="\t", quote =FALSE)

dir()

0.1.3 ChIP peak annotation

R

library("ChIPpeakAnno")
library("GenomicRanges")

#library("rtracklayer")
#bed <- import("/home/participant/Course_Materials/ChIPSeq/Materials/motifs/Peaks_for_motif_detection.bed")
#gr1 <- GRanges(seqnames=bed$V1, ranges=IRanges(start=bed$V2, end=bed$V3)) 


df1<-read.table("Peaks_for_motif_detection.bed", header=FALSE)

gr1 <- GRanges(seqnames=df1$V1, ranges=IRanges(start=df1$V2, end=df1$V3))



##annotation


#source("https://bioconductor.org/biocLite.R")
#biocLite("EnsDb.Hsapiens.v86")

library("EnsDb.Hsapiens.v86")
library("ChIPpeakAnno")

## create annotation file from EnsDb or TxDb

annoData <- toGRanges(EnsDb.Hsapiens.v86, feature="gene")
annoData[1:2]

anno.gr1 <- annotatePeakInBatch(gr1, 
AnnotationData=annoData, 
output="nearestBiDirectionalPromoters",
bindingRegion=c(-2000, 500))


head(anno.gr1)

#annotate with Entrez IDs

library("org.Hs.eg.db")
anno.gr1 <- addGeneIDs(anno.gr1,
                            "org.Hs.eg.db",
                            IDs2Add = "entrez_id")
head(anno.gr1)

0.2 Motif detection

MEME-ChIP performs several motif analysis steps on a set of user provided DNA sequences in fasta format. It is especially useful for analyzing peaks from ChIP-seq experiments. MEME-ChIP can

  1. Discover novel DNA-binding motifs (MEME and DREME)
  2. Analyze them for similarity to known binding motifs (TOMTOM)
  3. Visualize the arrangement of the predicted motif sites in your input sequences (SPAMO, Fimo)
  4. Identify motif enrichment relative to a background model (Centrimo)

0.2.1 Get fasta for peak regions

bash

# If strand information is known use -s option as well
bedtools getfasta -fi /home/participant/Course_Materials/Introduction/SS_DB/Reference/hg38_chr3.fa \
-bed /home/participant/Course_Materials/ChIPSeq/Materials/motifs/Peaks_for_motif_detection.bed -fo /home/participant/Course_Materials/ChIPSeq/Materials/motifs/Tp73_peaks.fa




# find a line (3rd line in this case) and see its contents
# sed -n '3p'  filename

# delete a line (DON'T RUN unless there is a need to remove a line)
# sed -i".bak" '3d' filename

0.2.2 Meme and Dreme

bash


#look at the options in meme-chip, meme and dreme

meme-chip

meme

dreme

Meme is an EM algorithm and Dreme uses pattern matching to finds de novo motifs.

The fasta file should be 500bp for each peak, Meme looks at a 100bp window around the peak center for motifs. Repeat masking is recommended.

  • TOMTOM compares motifs discovered with known motifs from PWM databases.
  • Centrimo looks for motif enrichment.
  • MAST can visualize the location of the discovered motifs within a sequence.
  • AMA measures how strongly a motif is associated with each sequence
  • AME discovers subtly enriched known bindng motifs in the sequences.
  • FIMO detects motif locations

0.3 Motif Enrichment Analysis

Motif database for meme analysis has been setup. If it’s NOT there, follow the instructions below:

Download meme motif PWM databases onto your working folder PWM databases

bash


cd "/home/participant/Course_Materials/ChIPSeq/Materials/motifs"

wget "http://meme-suite.org/meme-software/Databases/motifs/motif_databases.12.17.tgz"
chmod 755 motif_databases.12.17.tgz

#unzip them

tar -zxvf motif_databases.12.17.tgz

0.3.1 MemeChIP

bash


# If the meme directory is missing create it
# mkdir meme

meme-chip -oc /home/participant/Course_Materials/ChIPSeq/Materials/motifs/meme/ \
-db /home/participant/Course_Materials/ChIPSeq/Materials/motif_databases/EUKARYOTE/jolma2013.meme \
-db /home/participant/Course_Materials/ChIPSeq/Materials/motif_databases/JASPAR/JASPAR_CORE_2016_vertebrates.meme \
-meme-mod zoops -meme-minw 6 -order 1 \
-meme-maxw 30 -meme-nmotifs 10 -dreme-e 0.05 -centrimo-score 5.0 -centrimo-ethresh 10.0 /home/participant/Course_Materials/ChIPSeq/Materials/motifs/Tp73_peaks.fa

or upload fasta file meme-chip

0.3.2 PScanChIP

Demonstration on web version of PScanChIP

PScanChIP website

  • Upload your bed file

0.4 Network Analysis

0.4.1 STRING

STRING website