Skip to content

Normalizing subsetted data #90

@oligomyeggo

Description

@oligomyeggo

Hi @LuckyMD, I've been reprocessing some old data using your single cell tutorial workflow and have a best practices question (I am not sure if this is the correct place for this, or if this should be moved to the new scverse discourse group?). I have an adata object that is scran normalized. I want to take a subset of clusters from the adata object and create a new adata_sub object with its own dimensional reduction to investigate a subpopulation of interest. My understanding is that, had I opted for a basic log normalization, I would not need to re-normalize adata_sub as log normalization is done on a per-cell basis. However, because scran normalization uses a coarse clustering of cells present in the object, I would want to re-normalize adata_sub if adata had been normalized via scran, correct? What would be the best way to subset and re-normalize adata_sub? I am not sure what steps of the original scran normalization process need to be repeated and what steps can be omitted. For instance, I would want to perform a new clustering for the subsetted data for scran normalization and get new size factors, but I wouldn't need to set adata_sub.layers['counts'] = adata_sub.X.copy(), as adata_sub contains a subsetted counts layer from adata (correct?). Would I need to restore adata_sub.raw = adata_sub in this scenario?:

# subset adata to clusters of interest
adata_sub = adata[adata.obs['leiden_r1.0'].isin(['1, '3', '5'])].copy()

# perform clustering for scran normalization
adata_sub_pp = adata_sub.copy()
#sc.pp.normalize_per_cell(adata_sub_pp, counts_per_cell_after = 1e6) - can we omit this since we did it for adata?
#sc.pp.log1p(adata_sub_pp) - can we omit this since we did it for adata?
sc.pp.pca(adata_sub_pp, n_comps = 15)
sc.pp.neighbors(adata_sub_pp)
sc.tl.leiden(adata_sub_pp, key_added = 'groups', resolution = 0.5)

# preprocess variables for scran normalization
input_groups = adata_sub_pp.obs['groups']
data_mat = adata_sub.X.T
%%R -i data_mat -i input_groups -o size_factors

size_factors = sizeFactors(computeSumFactors(SingleCellExperiment(list(counts = data_mat)), 
                                             clusters = input_groups, 
                                             min.mean = 0.1))
del adata_sub_pp

adata_sub.obs['size_factors'] = size_factors # overwrites existing ['size_factors'] from adata

# adata_sub.layers['counts'] = adata_sub.X.copy() - this can be omitted?

# Normalize adata_sub
adata_sub.X /= adata_sub.obs['size_factors'].values[:,None]
sc.pp.log1p(adata_sub) # should this be omitted?
adata_sub.X = sp.sparse.csr_matrix(adata_sub.X)
adata_sub.raw = adata_sub

Thank you for any help and advice!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions