From 351ee5f055e6b4d2bfd15683f2d22bd3ea4865f7 Mon Sep 17 00:00:00 2001 From: Yu Wang <43355429+yuw444@users.noreply.github.com> Date: Fri, 3 Oct 2025 15:00:24 -0500 Subject: [PATCH 1/2] Clarify GC bias preference in computeGCBias documentation Updated the description of GC bias in sequencing to clarify the preference of DNA polymerases for GC-moderate regions instead of GC-rich regions. --- docs/content/tools/computeGCBias.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/content/tools/computeGCBias.rst b/docs/content/tools/computeGCBias.rst index c3ba679f58..f0e0cfafaf 100644 --- a/docs/content/tools/computeGCBias.rst +++ b/docs/content/tools/computeGCBias.rst @@ -15,7 +15,7 @@ Background ``computeGCBias`` is based on a paper by `Benjamini and Speed `_. The basic assumption of the GC bias diagnosis is that an ideal sample should show a uniform distribution of sequenced reads across the genome, i.e. all regions of the genome should have similar numbers of reads, regardless of their base-pair composition. -In reality, the DNA polymerases used for PCR-based amplifications during the library preparation of the sequencing protocols prefer GC-rich regions. This will influence the outcome of the sequencing as there will be more reads for GC-rich regions just because of the DNA polymerase's preference. +In reality, the DNA polymerases used for PCR-based amplifications during the library preparation of the sequencing protocols prefer GC-moderate regions. This will influence the outcome of the sequencing as there will be more reads for GC-moderate regions just because of the DNA polymerase's preference. As shown **real-life-data** below, the peak is at where the GC content is moderate. ``computeGCbias`` will first calculate the **expected GC profile** by counting the number of DNA fragments of a fixed size per GC fraction where GC fraction is defined as the number of G's or C's in a genome region of a given length. The result is basically a histogram depicting the frequency of DNA fragments for each type of genome region with a GC fraction between 0 to 100 percent. This will be different for each reference genome, but is independent of the actual sequencing experiment. From 64779257279b6eebaf8c799eabbb89909815ca1d Mon Sep 17 00:00:00 2001 From: Yu Wang <43355429+yuw444@users.noreply.github.com> Date: Fri, 3 Oct 2025 15:23:33 -0500 Subject: [PATCH 2/2] Clarify expected values in GC-bias correction Updated description to clarify GC-bias correction method. --- deeptools/correctGCBias.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/deeptools/correctGCBias.py b/deeptools/correctGCBias.py index 1154b93688..81fdfd858e 100755 --- a/deeptools/correctGCBias.py +++ b/deeptools/correctGCBias.py @@ -33,7 +33,7 @@ def parse_arguments(args=None): ' method proposed by [Benjamini & Speed (2012). ' 'Nucleic Acids Research, 40(10)]. It will remove reads' ' from regions with too high coverage compared to the' - ' expected values (typically GC-rich regions) and will' + ' expected values (typically GC-moderate regions) and will' ' add reads to regions where too few reads are seen ' '(typically AT-rich regions). ' 'The tool ``computeGCBias`` needs to be run first to generate the '