bin_to_cell functioning

Dear bin2cell team,
thank you very much for creating such a useful tool. I have a simple question after running a simplified version of the [demo notebook](https://nbviewer.org/github/Teichlab/bin2cell/blob/main/notebooks/demo.ipynb).

After running `b2c.bin_to_cell()` on `adata` and creating `cdata`, I have noticed that all the bin information is lost in the process and the `object_id` is the new (and only) cell identifier. Below is a snippet from the console output:

```
>>> adata
AnnData object with n_obs × n_vars = 6132629 × 18823
    obs: 'in_tissue', 'array_row', 'array_col', 'n_counts', 'destripe_factor', 'n_counts_adjusted', 'labels_he', 'labels_expanded'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells'
    uns: 'spatial', 'bin2cell'
    obsm: 'spatial', 'spatial_cropped_150_buffer'
>>> cdata = b2c.bin_to_cell(adata, labels_key="labels_he", spatial_keys=["spatial", "spatial_cropped_150_buffer"])
>>> cdata
AnnData object with n_obs × n_vars = 61842 × 18823
    obs: 'object_id', 'bin_count', 'array_row', 'array_col'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells'
    uns: 'spatial'
    obsm: 'spatial', 'spatial_cropped_150_buffer'
```

I have several questions with respect to how `b2c.bin_to_cell()` is intended to work, and that may help improve the troubleshooting and results interpretation on the downstream analyses:
 
1. The aggregation of counts is done based on the 'n_counts' or on the 'n_counts_adjusted'? Wouldn't it be good to keep both?
2. Maybe I'm missing something here, but is the information with respect to the number of bins (and counts) that are aggregated into a single cell stored anywhere on `cdata`? I understand that this is lost in the agreggation. I would suggest to include, at least, the number of bins that were aggregated into a single cell and the mean number of transcripts that they contained.
3. In a similar direction, what happens with the bins that are not assigned to any cell? Is that information removed from `cdata`? I understand it may not be something to store on `cdata` (and even that it may be out of the scope of this function use), but it would be very useful to be able to profile how noisy the assignation is with respect to the gene expression of those unassigned bins. I am not a developer, but maybe the expression of the extracellular matrix could be used to compute some sort of probability/ies for expression purity in the `bin_to_cell `calling.

Thank you in advance, looking forward to use your tool in my own samples!
Sergio

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bin_to_cell functioning #51

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bin_to_cell functioning #51

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions