Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,11 @@ MungeCredibleSet:
Promoters: the absolute path to the file containing promoter annotations. E.g.,{path}/V2G/resources/RefSeqCurated.170308.bed.CollapsedGeneBounds.TSS500bp.bed"

UbiquitouslyExpressedGenes: the absolute path to the bed file containing ubiquitously expressed genes. E.g., {path}/V2G/resources/UbiquitouslyExpressedGenes.txt
ABCPRED: the absolute path to the ABC results for variant overlapping. E.g.{path}/V2G/resources/size_sorted_CombinedPredictions.AvgHiC.ABC0.015.minus150.txt.gz"
ABCPRED: the absolute path to the ABC results for variant overlapping. Please include all relevant cell types and stimulation conditions. E.g.{path}/V2G/resources/size_sorted_CombinedPredictions.AvgHiC.ABC0.015.minus150.txt.gz"

ALLPEAKS: the absolute path to the directory containing [cell type-specific peaks](https://mitra.stanford.edu/engreitz/public/SchnitzlerKang2023/EnhancerList.minus150).

CelltypeTable: the absolute path to the table specifying cell type name patterns for cell type groups. The table is for catagorizing the cell types in ABCPRD and ALLPEAKS. E.g. {path}/V2G/resources/grouped_celltype_table.txt"
CelltypeTable: the absolute path to the table specifying cell type name patterns for cell type groups. Please include names for all target cell types in this table. The table is for catagorizing the cell types in ABCPRD and ALLPEAKS. E.g. {path}/V2G/resources/grouped_celltype_table.txt"
Copy link

Copilot AI Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "catagorizing" should be "categorizing".

Suggested change
CelltypeTable: the absolute path to the table specifying cell type name patterns for cell type groups. Please include names for all target cell types in this table. The table is for catagorizing the cell types in ABCPRD and ALLPEAKS. E.g. {path}/V2G/resources/grouped_celltype_table.txt"
CelltypeTable: the absolute path to the table specifying cell type name patterns for cell type groups. Please include names for all target cell types in this table. The table is for categorizing the cell types in ABCPRD and ALLPEAKS. E.g. {path}/V2G/resources/grouped_celltype_table.txt"

Copilot uses AI. Check for mistakes.

LipidBloodAssociationTable: the absolute path to the table containing variants associated with lipid level regulation. E.g., {path}/V2G/resources/lipid.level.csv"

Expand Down
2 changes: 1 addition & 1 deletion resources/ABC_overlap.header
Original file line number Diff line number Diff line change
@@ -1 +1 @@
chr start end variant PosteriorProb Disease CredibleSet chr start end name class activity_base TargetGene TargetGeneTSS TargetGeneExpression TargetGenePromoterActivityQuantile TargetGeneIsExpressed distance isSelfPromoter powerlaw_contact powerlaw_contact_reference hic_contact hic_contact_pl_scaled hic_pseudocount hic_contact_pl_scaled_adj ABC.Score.Numerator ABC.Score powerlaw.Score.Numerator powerlaw.Score CellType
chr start end name class TargetGene TargetGeneEnsemblID TargetGeneTSS isSelfPromoter CellType distanceToTSS.Featuren ormalizedDNase_prom.Feature 3DContact.Feature ABC.Score.Feature numCandidateEnhGene.Feature numTSSEnhGene.Feature sumNearbyEnhancers.Feature ubiquitousExpressedGene.Feature Score
Copy link

Copilot AI Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing tab character in the header. The header has "distanceToTSS.Featuren ormalizedDNase_prom.Feature" which should be "distanceToTSS.Feature NormalizedDNase_prom.Feature" (tab should separate "Feature" and "NormalizedDNase_prom.Feature").

Suggested change
chr start end name class TargetGene TargetGeneEnsemblID TargetGeneTSS isSelfPromoter CellType distanceToTSS.Featuren ormalizedDNase_prom.Feature 3DContact.Feature ABC.Score.Feature numCandidateEnhGene.Feature numTSSEnhGene.Feature sumNearbyEnhancers.Feature ubiquitousExpressedGene.Feature Score
chr start end name class TargetGene TargetGeneEnsemblID TargetGeneTSS isSelfPromoter CellType distanceToTSS.Feature NormalizedDNase_prom.Feature 3DContact.Feature ABC.Score.Feature numCandidateEnhGene.Feature numTSSEnhGene.Feature sumNearbyEnhancers.Feature ubiquitousExpressedGene.Feature Score

Copilot uses AI. Check for mistakes.
2 changes: 1 addition & 1 deletion resources/grouped_celltype_table.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Telo|telo|HUVEC|EPC|EC|endocardium|endothelial|ahy926|Endothelial|Endocardial|en
vcm_fetal|acm_fetal|Day30_Cardiomyocytes|Day15_cardiomyocytes|cardiac_muscle_cell-ENCODE CM
smc|SMC|coronary_artery_smooth_muscle_cell-Miller2016 SMC
fib Fibroblasts
hepatocyte|HepG2|liver Hepatocytes
hepatocyte|HepG2|liver|HuH-7|HuH-7.5|hepatic Hepatocytes
Adipocyte|adipose Adipose
ventricle|F6|F8|F19|coronary_artery-ENCODE HeartOrArteryTissue
H1_BMP4_Derived_Mesendoderm_Cultured_Cells-Roadmap|hESC|telo|Telo|SMC|Day5_cardiacprogenitors|Day30_Cardiomyocytes|Day2_mesoderm|Day15_cardiomyocytes|Day0_ipsc|coronary_artery_smooth_muscle_cell-Miller2016 CellModels
Expand Down
41 changes: 26 additions & 15 deletions snakemake/workflow/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,16 @@ def getZeroIndexed(wildcards):

def getExcludeVariants(wildcards):
return(str(variant_table.loc[wildcards.trait, "ExcludeVariants"]))

## Function to extract whether PIP information is included
def hasPIP(wildcards):
# test if variant_table column PIP is NA in various form
return(str(variant_table.loc[wildcards.trait, "PIPCol"] != "NA" and variant_table.loc[wildcards.trait, "PIPCol"] != "None"))


##########################################################
rule CreateCredibleSets:
""" Reformat variant list and create a summary table of each credible set, including information such as whether the variants in the credible sets overlap a promoter, are coding variants or are splicing variants. """
""" Reformat variant list and create a summary table of each credible set, including information such as whether the variants in the credible sets overlap a promoter, are coding variants or are splicing variants. """
input:
fine_mapped_variants=getVariants,
params:
Expand Down Expand Up @@ -91,7 +98,7 @@ rule CreateCredibleSets:
"""

rule create_locus_window:
""" Create a +/- 2 million base pairs window centerted at the lead variant for each credible set. """
""" Create a +/- 2 million base pairs window centerted at the lead variant for each credible set. """
input:
variant_list=os.path.join(config["OutDir"], "{trait}", "variant.list.txt"),
cs_list=os.path.join(config["OutDir"], "{trait}", "all.cs.txt")
Expand All @@ -114,7 +121,7 @@ rule create_locus_window:
"""

rule intersect_window_w_genes:
""" Intersect the locus windows with gene promoters. """
""" Intersect the locus windows with gene promoters. """
input:
locus_window_table=os.path.join(config["OutDir"], "{trait}","intermediate_files", "{trait}_gwas_table_base.bed"),
params:
Expand All @@ -135,7 +142,7 @@ rule intersect_window_w_genes:
"""

rule merge_locus_window_w_genes:
""" Merge the genes overlapping locus window with their corresponding credible set information. Filter genes at each credible set to keep at least two genes on each side of the lead variants (4 in total); and include genes > +/- 500 kb away from the lead variants if there are < 2 genes on each side of the variant within the +/- 500kb window until there are at least 2 genes on each side of the lead variant. """
""" Merge the genes overlapping locus window with their corresponding credible set information. Filter genes at each credible set to keep at least two genes on each side of the lead variants (4 in total); and include genes > +/- 500 kb away from the lead variants if there are < 2 genes on each side of the variant within the +/- 500kb window until there are at least 2 genes on each side of the lead variant. """
input:
locus_window_table=os.path.join(config["OutDir"], "{trait}","intermediate_files", "{trait}_gwas_table_base.txt"),
gene_overlap_window=os.path.join(config["OutDir"], "{trait}","intermediate_files", "{trait}_gwas_table_base.sorted.gene_overlap.bed"),
Expand All @@ -161,7 +168,7 @@ rule merge_locus_window_w_genes:
"""

rule intersectWithABC:
""" Intersect variants with ABC enhancers"""
""" Intersect variants with ABC enhancers"""
input:
variants=os.path.join(config["OutDir"], "{trait}", "variant.list.txt")
output:
Expand All @@ -181,12 +188,12 @@ rule intersectWithABC:
sed 1d | \
bedtools sort -i stdin -g {params.hg19sizes} > {output.sorted_variants}

cat {params.header} > {output.ABC_overlap}
{{ printf 'chr\tstart\tend\tvariant\tPosteriorProb\tTrait\tCredibleSet\t'; zcat {params.abcPred} | head -1; }} > {output.ABC_overlap}
zcat {params.abcPred} | sed 1d | bedtools sort -faidx {params.hg19sizes} -i stdin |bedtools intersect -sorted -wa -wb -a {output.sorted_variants} -b stdin -g {params.hg19sizes} >> {output.ABC_overlap}
"""

rule intersectWithPeaks:
""" Intersect variants with cell-type specific peaks."""
""" Intersect variants with cell-type specific peaks."""
input:
sorted_variants=os.path.join(config["OutDir"], "{trait}", "intermediate_files", "Variants.sorted.bed"),
output:
Expand All @@ -213,7 +220,7 @@ rule intersectWithPeaks:
"""

rule intersectWithCelltypes:
""" Create summary tables of ABC enhancer-overlapping variants."""
""" Create summary tables of ABC enhancer-overlapping variants."""
input:
ABCOverlap=os.path.join(config["OutDir"], "{trait}", "ABCOverlapFull.tsv"),
PeakOverlapFull=os.path.join(config["OutDir"], "{trait}", "PeaksOverlapFull.tsv"),
Expand Down Expand Up @@ -249,7 +256,7 @@ rule intersectWithCelltypes:
"""

rule groupABCByCelltypes:
""" Summarize ABC enhancer-overlapping variants by cell-type group."""
""" Summarize ABC enhancer-overlapping variants by cell-type group."""
input:
ABC_overlap_ranked=os.path.join(config["OutDir"], "{trait}", "ABCOverlapFull.ranked.tsv")
output:
Expand All @@ -260,7 +267,8 @@ rule groupABCByCelltypes:
grouped_celltype_table=config["CelltypeTable"],
source=getSource,
codeDir=config["CodeDir"],
PIP=config["groupABCByCelltypes"]["PIP"]
PIP=config["groupABCByCelltypes"]["PIP"],
hasPIP=hasPIP,
resources:
mem_gb=16,
runtime_hr=3
Expand All @@ -273,11 +281,12 @@ rule groupABCByCelltypes:
--ranked_ABC_table {input.ABC_overlap_ranked} \
--source {params.source} \
--PIP {params.PIP} \
--hasPIP {params.hasPIP} \
--helperFunctions {params.helper_functions}
"""

rule groupPeakOverlap:
""" Summarize peak-overlapping variants by cell-type group."""
""" Summarize peak-overlapping variants by cell-type group."""
input:
CredibleSet_peak_overlap_summary=os.path.join(config["OutDir"], "{trait}", "CredibleSetPeakOverlapSummary.tsv"),
PeaksOverlapFull_Peaks=os.path.join(config["OutDir"],"{trait}", "PeaksOverlapFull.Peaks.tsv"),
Expand All @@ -290,7 +299,8 @@ rule groupPeakOverlap:
grouped_celltype_tables=config["CelltypeTable"],
source=getSource,
codeDir=config["CodeDir"],
PIP=config["groupPeakOverlap"]["PIP"]
PIP=config["groupPeakOverlap"]["PIP"],
hasPIP=hasPIP,
resources:
mem_gb=16,
runtime_hr=3
Expand All @@ -304,10 +314,11 @@ rule groupPeakOverlap:
--cellGroup {wildcards.cellGroup} \
--source {params.source} \
--PIP {params.PIP} \
--hasPIP {params.hasPIP} \
--helperFunctions {params.helper_functions}
"""
rule createABCtable:
"""Create an information table summarizing the locations of the intermediate files for rule combinedOutputTables. """
"""Create an information table summarizing the locations of the intermediate files for rule combinedOutputTables. """
input:
credibleset_peakinfo=expand(os.path.join(config["OutDir"],"{{trait}}", "CredibleSetPeakOverlapSummary.with{cellGroup}peakinfo.tsv"),cellGroup=CellGroupsOfInterest),
credibleset_peakinfo_cellgroup_only=expand(os.path.join(config["OutDir"],"{{trait}}", "CredibleSetPeakOverlapSummary.withpeakinfo.only{cellGroup}.tsv"),cellGroup=CellGroupsOfInterest),
Expand All @@ -328,7 +339,7 @@ rule createABCtable:


rule combineOutputTables:
""" Combine intermediate tables to create the final credible set to gene table."""
""" Combine intermediate tables to create the final credible set to gene table."""
input:
base_df=os.path.join(config["OutDir"], "{trait}","intermediate_files", "{trait}_gwas_table_base_genes.txt"),
ABC_overlapping_df=os.path.join(config["OutDir"],"{trait}", "peak_ABC_overlapping_table.tsv"),
Expand Down Expand Up @@ -358,7 +369,7 @@ rule combineOutputTables:
"""

rule getVariantInfo:
""" combineOutputTables creates a table with credible set level information. getVariantInfo details the variants that contribute to the credible set-gene linking."""
""" combineOutputTables creates a table with credible set level information. getVariantInfo details the variants that contribute to the credible set-gene linking."""
input:
combined_table=os.path.join(config["OutDir"],"{trait}", "{trait}_cell2gene.txt"),
variant_list=os.path.join(config["OutDir"], "{trait}", "variant.list.txt"),
Expand Down
4 changes: 3 additions & 1 deletion snakemake/workflow/scripts/CreateCS.R
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,11 @@ option.list <- list(
make_option("--excludeVariants", type="character", help="Variant to exclude")
)
opt <- parse_args(OptionParser(option_list=option.list))


source(opt$helperFunctions)
##############################################################################
variants=read.table(opt$fineMappedVariants, header=T, stringsAsFactors=F)
variants=read.table(opt$fineMappedVariants, header=T, stringsAsFactors=F, fill=T)
##############################################################################
## Load common data
genes <- readBed(opt$genes)
Expand Down
8 changes: 5 additions & 3 deletions snakemake/workflow/scripts/celltype.overlap.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,10 @@ option.list <- list(
)

opt <- parse_args(OptionParser(option_list=option.list))

source(opt$helperFunctions)
setwd(opt$outDir)
abc <- read.delim(opt$ABCOverlap)
abc <- read.delim(opt$ABCOverlap)
if (opt$removeNonCoding){
abc <- abc %>% filter(!greplany(c("^LINC","-AS","^MIR","RNU","^LOC"),TargetGene))
}else{
Expand All @@ -26,7 +27,8 @@ if (opt$removeNonCoding){
variants <- read.delim(opt$variantList)
all.cs <- read.delim(opt$csList)

geneRanks <- abc %>% group_by(CredibleSet,TargetGene) %>% summarise(MaxABC=max(ABC.Score)) %>% as.data.frame()
ABC.Score.column <- if("ABC.Score" %in% colnames(abc)) "ABC.Score" else "Score"
geneRanks <- abc %>% group_by(CredibleSet,TargetGene) %>% summarise(MaxABC=max(get(ABC.Score.column))) %>% as.data.frame()
geneRanks <- geneRanks %>% group_by(CredibleSet) %>% mutate(GeneRank=dense_rank(-MaxABC)) %>% as.data.frame()
abc.ranked <- merge(abc, geneRanks)
write.table(abc.ranked, file="ABCOverlapFull.ranked.tsv",row.names=F, col.names=T, sep='\t', quote=F)
Expand All @@ -44,7 +46,7 @@ tmp <- abc.ranked %>% group_by(variant,CredibleSet) %>%
summarise(
AllCellTypes=paste0(CellType, collapse=','),
TargetGenes=paste0(unique(TargetGene), collapse=','),
TopGene=unique(TargetGene[which.max(ABC.Score)])) %>%
TopGene=unique(TargetGene[which.max(get(ABC.Score.column))])) %>%
merge(all.cs) %>% arrange(CredibleSet) %>% unique() %>% as.data.frame()

write.table(tmp, file="ABCVariantOverlapSummary.tsv",row.names=F, col.names=T, sep='\t', quote=F)
Expand Down
Loading