bash
cd /home/participant/Course_Materials/ChIPSeq/Materials/
#mkdir motifs if it doesn't exist
cd motifs
R
# read in the peak summit file
setwd("/home/participant/Course_Materials/ChIPSeq/Materials/motifs")
peakfile <-read.table("/home/participant/Course_Materials/ChIPSeq/Materials/Peaks/TAp73beta_r2.fastq_trimmed.fastq_sorted_filterdup.bed")
# get a 400bp window around summit (-200 and +200)
Motif_Peaks <- data.frame((peakfile$V1), (peakfile$V2-250), (peakfile$V2+250), stringsAsFactors = FALSE)
# create bed file
options(scipen=999)
write.table(Motif_Peaks, file= "/home/participant/Course_Materials/ChIPSeq/Materials/motifs/Peaks_for_motif_detection.bed",
row.names=FALSE,col.names = FALSE, sep="\t", quote =FALSE)
dir()
R
library("ChIPpeakAnno")
library("GenomicRanges")
#library("rtracklayer")
#bed <- import("/home/participant/Course_Materials/ChIPSeq/Materials/motifs/Peaks_for_motif_detection.bed")
#gr1 <- GRanges(seqnames=bed$V1, ranges=IRanges(start=bed$V2, end=bed$V3))
df1<-read.table("Peaks_for_motif_detection.bed", header=FALSE)
gr1 <- GRanges(seqnames=df1$V1, ranges=IRanges(start=df1$V2, end=df1$V3))
##annotation
#source("https://bioconductor.org/biocLite.R")
#biocLite("EnsDb.Hsapiens.v86")
library("EnsDb.Hsapiens.v86")
library("ChIPpeakAnno")
## create annotation file from EnsDb or TxDb
annoData <- toGRanges(EnsDb.Hsapiens.v86, feature="gene")
annoData[1:2]
anno.gr1 <- annotatePeakInBatch(gr1,
AnnotationData=annoData,
output="nearestBiDirectionalPromoters",
bindingRegion=c(-2000, 500))
head(anno.gr1)
#annotate with Entrez IDs
library("org.Hs.eg.db")
anno.gr1 <- addGeneIDs(anno.gr1,
"org.Hs.eg.db",
IDs2Add = "entrez_id")
head(anno.gr1)
MEME-ChIP performs several motif analysis steps on a set of user provided DNA sequences in fasta format. It is especially useful for analyzing peaks from ChIP-seq experiments. MEME-ChIP can
bash
# If strand information is known use -s option as well
bedtools getfasta -fi /home/participant/Course_Materials/Introduction/SS_DB/Reference/hg38_chr3.fa \
-bed /home/participant/Course_Materials/ChIPSeq/Materials/motifs/Peaks_for_motif_detection.bed -fo /home/participant/Course_Materials/ChIPSeq/Materials/motifs/Tp73_peaks.fa
# find a line (3rd line in this case) and see its contents
# sed -n '3p' filename
# delete a line (DON'T RUN unless there is a need to remove a line)
# sed -i".bak" '3d' filename
bash
#look at the options in meme-chip, meme and dreme
meme-chip
meme
dreme
Meme is an EM algorithm and Dreme uses pattern matching to finds de novo motifs.
The fasta file should be 500bp for each peak, Meme looks at a 100bp window around the peak center for motifs. Repeat masking is recommended.
Motif database for meme analysis has been setup. If it’s NOT there, follow the instructions below:
Download meme motif PWM databases onto your working folder PWM databases
bash
cd "/home/participant/Course_Materials/ChIPSeq/Materials/motifs"
wget "http://meme-suite.org/meme-software/Databases/motifs/motif_databases.12.17.tgz"
chmod 755 motif_databases.12.17.tgz
#unzip them
tar -zxvf motif_databases.12.17.tgz
bash
# If the meme directory is missing create it
# mkdir meme
meme-chip -oc /home/participant/Course_Materials/ChIPSeq/Materials/motifs/meme/ \
-db /home/participant/Course_Materials/ChIPSeq/Materials/motif_databases/EUKARYOTE/jolma2013.meme \
-db /home/participant/Course_Materials/ChIPSeq/Materials/motif_databases/JASPAR/JASPAR_CORE_2016_vertebrates.meme \
-meme-mod zoops -meme-minw 6 -order 1 \
-meme-maxw 30 -meme-nmotifs 10 -dreme-e 0.05 -centrimo-score 5.0 -centrimo-ethresh 10.0 /home/participant/Course_Materials/ChIPSeq/Materials/motifs/Tp73_peaks.fa
or upload fasta file meme-chip