Skip to content

SEACR input and spike-in normalization #77

@jordan841220

Description

@jordan841220

Thank you for your contribution in this amazing tool. My question might be naive but I couldn't find any explanation in the paper or github (or maybe I simply ignored it).

  1. From the peak calling section in the pipeline as described below, I found that the input bedgraph file of SEACR was inherently the intermediate ouput of macs2 callpeak. I was curious why this CUT&RUN pipeline was designed like this, instead of converting the original BAM/BED file to bedgraph file using bedtools genomecov, and use it as the input of SEACR (I have tried and the results were very different), which was wrote in the SEACR tutorial.
# macs2 narrow peak calling
>&2 echo "[info] macs2 narrow peak calling"
$macs2bin/macs2 callpeak -t $bam_file -g $macs2_genome -f BAMPE -n $base_file --outdir $outdir -q 0.01 -B --SPMR --keep-dup all 2> $logdir/"$base_file".macs2.narrow

# macs2 broad peak calling
>&2 echo "[info] macs2 broad peak calling"
$macs2bin/macs2 callpeak -t $bam_file -g $macs2_genome -f BAMPE -n $base_file --outdir $outdirbroad --broad --broad-cutoff 0.1 -B --SPMR --keep-dup all 2> $logdir/"$base_file".macs2.broad

# macs2 broad peak calling
>&2 echo "[info] Getting broad peak summits"
$pythonbin/python $extratoolsbin/get_summits_broadPeak.py $outdirbroad/"$base_file"_peaks.broadPeak | $bedopsbin/sort-bed - > $outdirbroad/"$base_file"_summits.bed

# SEACR peak calls
>&2 echo "[info] SEACR stringent peak calling"
$macs2bin/macs2 callpeak -t $bam_file -g $macs2_genome -f BAMPE -n $base_file --outdir $outdirseac -q 0.01 -B --SPMR --keep-dup all 2> $logdir/"$base_file".seacr
$pythonbin/python $extratoolsbin/change.bdg.py $outdirseac/"$base_file"_treat_pileup.bdg > $outdirseac/"$base_file"_treat_integer.bdg
chmod +x $extratoolsbin/SEACR_1.1.sh
$extratoolsbin/SEACR_1.1.sh $outdirseac/"$base_file"_treat_integer.bdg 0.01 non stringent $outdirseac/"$base_file"_treat $Rscriptbin
$bedopsbin/sort-bed $outdirseac/"$base_file"_treat.stringent.bed > $outdirseac/"$base_file"_treat.stringent.sort.bed
$pythonbin/python $extratoolsbin/get_summits_seacr.py $outdirseac/"$base_file"_treat.stringent.bed|$bedopsbin/sort-bed - > $outdirseac/"$base_file"_treat.stringent.sort.summits.bed
for i in _summits.bed _peaks.xls _peaks.narrowPeak _control_lambda.bdg _treat_pileup.bdg; do 
    rm -rf $outdirseac/"$base_file"$i
done
  1. The pipeline conducted the spike_in normalization using bamCoverage after peak calling. Isn't it make sense to use the spike_in-normalized bedgraph file to perform peak calling?

Thanks!!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions