Computational Notebooks for "Morphology-Aware Profiling of Highly Multiplexed Tissue Images using Variational Autoencoders"

Gregory J. Baker^1,2,3,&,,#, Edward Novikov^1,4,, Shannon Coy^1,2,5, Yu-An Chen^1,2, Clemens B. Hug¹, Zergham Ahmed^1,4, Sebastián A. Cajas Ordóñez⁴, Siyu Huang^4,%, Clarence Yapp¹, Gaurav N. Joshi⁶, Fumiki Yanagawa⁶, Artem Sokolov¹, Hanspeter Pfister⁴, Peter K. Sorger^1,2,3,#

¹Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA ²Ludwig Center for Cancer Research at Harvard, Harvard Medical School, Boston, MA ³Department of Systems Biology, Harvard Medical School, Boston, MA ⁴Harvard John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA ⁵Department of Pathology, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA ⁶Nikon Instruments, Lexington, MA

^& Current affiliation: Division of Oncological Sciences, Knight Cancer Institute, Oregon Health & Science University, Portland, OR
^% Current affiliation: Visual Computing Division, School of Computing, Clemson University, Clemson, SC

*Co-first Authors: G.J.B., E.N.
#Corresponding Authors: bakergr@ohsu.edu (G.J.B.), peter_sorger@hms.harvard.edu (P.K.S.)

Abstract

Spatial proteomics (highly multiplexed tissue imaging) provides unprecedented insight into the types, states, and spatial organization of cells within preserved tissue environments. To enable single-cell analysis, high-plex images are typically segmented using algorithms that assign marker signals to individual cells. However, conventional segmentation is often imprecise and susceptible to signal spillover between adjacent cells, interfering with accurate cell type identification. Segmentation-based methods also fail to capture the morphological detail that histopathologists rely on for disease diagnosis and staging. Here, we present a method that combines unsupervised, pixel-level machine learning using autoencoders with traditional segmentation to generate single-cell data that captures information on protein abundance, morphology, and local neighborhood in a manner analogous to human experts while overcoming signal spillover. We demonstrate the generality of this technique by applying it to CyCIF, Lunaphore COMET, and Akoya PhenoCycler data, and show that it can learn histological features across multiple spatial scales.

Running the computational notebooks

Python code in this GitHub repository is organized as Jupyter notebooks for generating the figures shown in the paper. To view the notebooks, first clone the repository onto your computer by opening a terminal window and entering the following command below. If git is not already installed, you can download it by following the instructions provided here.

git clone https://github.com/labsyspharm/vae-paper.git

Next, change directories into the top-level directory of the cloned repository and create and activate a dedicated Conda environment containing the necessary Python libraries for running the code. If conda is not already installed, it can be downloaded by following the instructions provided here.

cd <path/to/cloned/repo>

# macOS
conda env create -f environment_macOS.yml
conda activate morphaeus

# PC
conda env create -f environment_PC.yml
conda activate morphaeus
pip install git+https://github.com/labsyspharm/vae.git@v0.0.7

To browse the notebooks, change directories to the src folder and activate Jupyter Lab:

jupyter lab

Notebooks are pre-populated with output cells for ease of review. To re-run notebooks or explore multiplex images displayed in the Napari image viewer by some notebooks the input data must first be downloaded from our public Amazon S3 bucket (instructions are provided in the section below).

Downloading input data files

To re-run the Jupyter notebooks, input data must first be downloaded from our public Amazon S3 bucket into the the top-level directory of the cloned repository by running the download.py script located in the src folder from the top-level of the repository. In addition to the required data, this script will also download a folder containing precomputed output files for at-a-glance ease of reference (output_reference):

# from the top-level directory of the cloned vae-paper GitHub repository
python src/download.py

Note: ~335GB of storage space is required to download the complete file set.

To re-run any of the Jupyter notebooks, double click on a notebook filename at the left of the screen to open the corresponding notebook at the right. Next click the double-arrow button at the top of the notebook interface to restart the kernel and run all of the code cells. Notebook output is saved to a folder called output in the top-level directory of the repository.

MORPHӔUS source code and demo

MORPHÆUS source code is freely available for academic re-use under the MIT license on GitHub.

To run the MORPHÆUS pipeline demonstration, input data files whose names begin with CyCIF-1A must first be downloaded. These correspond to the first ten files downloaded after running the download.py script above; the remaining files are not required for the MORPHÆUS demo, so the download process can be terminated once these are obtained. Next, navigate to the demo directory in the cloned repository and run the following command:

# from the demo directory
vae config.yml

This will execute the pipeline on 13x13um image patches from the CyCIF-1A image presented in the paper, demonstrating all major modules ranging from single-cell sampling and image patch cropping, to VAE model training, plot visualization, and concept saliency analysis. Depending on the size of images, the cutting and storage of image patches generated in the RUN_CELLCUTTER module can be memory limiting; a minimum of 32GB RAM is required to run this demo without having to alter the cache_size_cellcutter and cells_per_chunk parameters in the MORPHÆUS configuration file (config.yml). If sufficient memory is available, the cache_size_cellcutter parameter can be increased beyond 32000 MB (the size of the CyCIF-1A image) to load the entire image file into RAM. This will allow the RUN_CELLCUTTER module to execute significantly faster by avoiding repeat reads from disk. Demo output is saved to demo/VAE13/.

For convenience, lightly pre-trained encoder and decoder networks are provided such that the pipeline skips the VAE training module. For those interested in training a model from scratch, simply add a # to the beginning of the encoder.hdf5 and decoder.hdf5 filenames in demo/VAE13/6_train_vae/ before running the pipeline; do the same for the TRAIN_VAE.txt checkpoint file in demo/VAE13/checkpoints/. When training on CPUs using relatively modern machines, epochs are estimated to complete in about 5 minutes each; training may be accelerated greatly using GPU resources.

Zenodo archive

This GitHub repository will be archived on Zenodo following publication of the manuscript.

Funding

This work was supported by NCI grant U01-CA284207, the Harvard Ludwig Center (P.K.S., S.S.), an ASPIRE Award from The Mark Foundation for Cancer Research, and the David Liposarcoma Research Initiative, and was initiated as part of the computational toolbox for the Human Tissue Atlas Network (HTAN).

References

Baker GJ., Novikov E. et al. Morphology-Aware Profiling of Highly Multiplexed Tissue Images using Variational Autoencoders. bioRxiv (2025) https://doi.org/10.1101/2025.06.23.661064

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
demo		demo
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment_PC.yml		environment_PC.yml
environment_macOS.yml		environment_macOS.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Computational Notebooks for "Morphology-Aware Profiling of Highly Multiplexed Tissue Images using Variational Autoencoders"

Gregory J. Baker^1,2,3,&,,#, Edward Novikov^1,4,, Shannon Coy^1,2,5, Yu-An Chen^1,2, Clemens B. Hug¹, Zergham Ahmed^1,4, Sebastián A. Cajas Ordóñez⁴, Siyu Huang^4,%, Clarence Yapp¹, Gaurav N. Joshi⁶, Fumiki Yanagawa⁶, Artem Sokolov¹, Hanspeter Pfister⁴, Peter K. Sorger^1,2,3,#

Abstract

Running the computational notebooks

Downloading input data files

MORPHӔUS source code and demo

Zenodo archive

Funding

References

About

Uh oh!

Releases

Packages

Languages

License

gjbaker/vae-paper

Folders and files

Latest commit

History

Repository files navigation

Computational Notebooks for "Morphology-Aware Profiling of Highly Multiplexed Tissue Images using Variational Autoencoders"

Gregory J. Baker1,2,3,&,*,#, Edward Novikov1,4,*, Shannon Coy1,2,5, Yu-An Chen1,2, Clemens B. Hug1, Zergham Ahmed1,4, Sebastián A. Cajas Ordóñez4, Siyu Huang4,%, Clarence Yapp1, Gaurav N. Joshi6, Fumiki Yanagawa6, Artem Sokolov1, Hanspeter Pfister4, Peter K. Sorger1,2,3,#

Abstract

Running the computational notebooks

Downloading input data files

MORPHӔUS source code and demo

Zenodo archive

Funding

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Gregory J. Baker^1,2,3,&,,#, Edward Novikov^1,4,, Shannon Coy^1,2,5, Yu-An Chen^1,2, Clemens B. Hug¹, Zergham Ahmed^1,4, Sebastián A. Cajas Ordóñez⁴, Siyu Huang^4,%, Clarence Yapp¹, Gaurav N. Joshi⁶, Fumiki Yanagawa⁶, Artem Sokolov¹, Hanspeter Pfister⁴, Peter K. Sorger^1,2,3,#

Packages