Normalizing subsetted data

Hi @LuckyMD, I've been reprocessing some old data using your single cell tutorial workflow and have a best practices question (I am not sure if this is the correct place for this, or if this should be moved to the new scverse discourse group?). I have an adata object that is scran normalized. I want to take a subset of clusters from the adata object and create a new adata_sub object with its own dimensional reduction to investigate a subpopulation of interest. My understanding is that, had I opted for a basic log normalization, I would not need to re-normalize adata_sub as log normalization is done on a per-cell basis. However, because scran normalization uses a coarse clustering of cells present in the object, I would want to re-normalize adata_sub if adata had been normalized via scran, correct? What would be the best way to subset and re-normalize adata_sub? I am not sure what steps of the original scran normalization process need to be repeated and what steps can be omitted. For instance, I would want to perform a new clustering for the subsetted data for scran normalization and get new size factors, but I wouldn't need to set `adata_sub.layers['counts'] = adata_sub.X.copy()`, as adata_sub contains a subsetted `counts` layer from adata (correct?). Would I need to restore `adata_sub.raw = adata_sub` in this scenario?:

```python
# subset adata to clusters of interest
adata_sub = adata[adata.obs['leiden_r1.0'].isin(['1, '3', '5'])].copy()

# perform clustering for scran normalization
adata_sub_pp = adata_sub.copy()
#sc.pp.normalize_per_cell(adata_sub_pp, counts_per_cell_after = 1e6) - can we omit this since we did it for adata?
#sc.pp.log1p(adata_sub_pp) - can we omit this since we did it for adata?
sc.pp.pca(adata_sub_pp, n_comps = 15)
sc.pp.neighbors(adata_sub_pp)
sc.tl.leiden(adata_sub_pp, key_added = 'groups', resolution = 0.5)

# preprocess variables for scran normalization
input_groups = adata_sub_pp.obs['groups']
data_mat = adata_sub.X.T
```
```python
%%R -i data_mat -i input_groups -o size_factors

size_factors = sizeFactors(computeSumFactors(SingleCellExperiment(list(counts = data_mat)), 
                                             clusters = input_groups, 
                                             min.mean = 0.1))
```
```python
del adata_sub_pp

adata_sub.obs['size_factors'] = size_factors # overwrites existing ['size_factors'] from adata

# adata_sub.layers['counts'] = adata_sub.X.copy() - this can be omitted?

# Normalize adata_sub
adata_sub.X /= adata_sub.obs['size_factors'].values[:,None]
sc.pp.log1p(adata_sub) # should this be omitted?
adata_sub.X = sp.sparse.csr_matrix(adata_sub.X)
adata_sub.raw = adata_sub
```

Thank you for any help and advice!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Normalizing subsetted data #90

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Normalizing subsetted data #90

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions