Code and derived parameters for the manuscript: Polymer-derived distance penalties improve chromatin interaction predictions from single-cell data across crop genomes.
This repository provides a framework to correct the systematic distance bias found in proxy data of 3D genome architecture, such as single-cell co-accessibility scores (scATAC-seq) or Deep Learning predictions. The method is grounded in polymer physics and uses experimental Hi-C data to derive a species-specific penalty function.
The method takes biased proxy data (e.g., co-accessibility scores) and a reference Hi-C dataset as input. It fits a multi-component power-law model to the Hi-C data to derive a penalty function, which is then applied to the proxy scores to produce a corrected, physically realistic interaction map.
--
To get started, clone the repository and install the required Python packages.
git clone https://github.com/jlab-code/polymer-penalty.gitcd polymer-penaltypip install -r requirements.txtThis framework has two main workflows:
- Deriving a new, custom penalty function from your own Hi-C data
- Applying a pre-computed penalty function to correct your proxy data (e.g., co-accessibility scores)
To generate a penalty function for a new species or cell type, you will need a file of chromatin loops from Hi-C data.
- Place your Hi-C loop file (in
.bedpeformat) in thedata/directory. - Open and run the Jupyter Notebook:
scripts/get_penalty_function.ipynb. - In the notebook, find the cell that defines the input file path and change it to your file:
# Before hic_path = "../data/soybean_leaf_HiC.hiccups_loops.fdr0.1.bedpe" # After hic_path = "../data/your_hic_data.bedpe"
- Run all the cells in the notebook.
- The script will output a new file containing your custom parameters, e.g.
penalty_functions/your_species_penalty_parameters.tsv.
Once you have a penalty function (either one we provided or one you derived), you can use it to correct your co-accessibility or Deep Learning scores.
- Place your co-accessibility file (in
.csvor.tsvformat) in thedata/directory. - Open and run the Jupyter Notebook: scripts/apply_penalty.ipynb.
- In the first code cell, update the SPECIES_TO_RUN variable and ensure the file paths are correct for your data.
# USER: Select wich species to run SPECIES_TO_RUN = "Soybean" # Options: "Maize", "Soybean", "Rice", or your custom name # --- Update file paths for your custom data --- coacr_file = os.path.join(data_directory, "your_coaccessibility_scores.csv") penalty_file = os.path.join(penalty_directory, "your_species_penalty_parameters.tsv")
- Run all the cells in the notebook.
- The script will generate a new file with the corrected scores in the
results/directory.
A test run using the provided example data for soybean is configured by default in the notebooks.
