-
Notifications
You must be signed in to change notification settings - Fork 483
Description
Hi @LuckyMD, I've been reprocessing some old data using your single cell tutorial workflow and have a best practices question (I am not sure if this is the correct place for this, or if this should be moved to the new scverse discourse group?). I have an adata object that is scran normalized. I want to take a subset of clusters from the adata object and create a new adata_sub object with its own dimensional reduction to investigate a subpopulation of interest. My understanding is that, had I opted for a basic log normalization, I would not need to re-normalize adata_sub as log normalization is done on a per-cell basis. However, because scran normalization uses a coarse clustering of cells present in the object, I would want to re-normalize adata_sub if adata had been normalized via scran, correct? What would be the best way to subset and re-normalize adata_sub? I am not sure what steps of the original scran normalization process need to be repeated and what steps can be omitted. For instance, I would want to perform a new clustering for the subsetted data for scran normalization and get new size factors, but I wouldn't need to set adata_sub.layers['counts'] = adata_sub.X.copy(), as adata_sub contains a subsetted counts layer from adata (correct?). Would I need to restore adata_sub.raw = adata_sub in this scenario?:
# subset adata to clusters of interest
adata_sub = adata[adata.obs['leiden_r1.0'].isin(['1, '3', '5'])].copy()
# perform clustering for scran normalization
adata_sub_pp = adata_sub.copy()
#sc.pp.normalize_per_cell(adata_sub_pp, counts_per_cell_after = 1e6) - can we omit this since we did it for adata?
#sc.pp.log1p(adata_sub_pp) - can we omit this since we did it for adata?
sc.pp.pca(adata_sub_pp, n_comps = 15)
sc.pp.neighbors(adata_sub_pp)
sc.tl.leiden(adata_sub_pp, key_added = 'groups', resolution = 0.5)
# preprocess variables for scran normalization
input_groups = adata_sub_pp.obs['groups']
data_mat = adata_sub.X.T%%R -i data_mat -i input_groups -o size_factors
size_factors = sizeFactors(computeSumFactors(SingleCellExperiment(list(counts = data_mat)),
clusters = input_groups,
min.mean = 0.1))del adata_sub_pp
adata_sub.obs['size_factors'] = size_factors # overwrites existing ['size_factors'] from adata
# adata_sub.layers['counts'] = adata_sub.X.copy() - this can be omitted?
# Normalize adata_sub
adata_sub.X /= adata_sub.obs['size_factors'].values[:,None]
sc.pp.log1p(adata_sub) # should this be omitted?
adata_sub.X = sp.sparse.csr_matrix(adata_sub.X)
adata_sub.raw = adata_subThank you for any help and advice!