Skip to content

bin_to_cell functioning #51

@Sergio-ote

Description

@Sergio-ote

Dear bin2cell team,
thank you very much for creating such a useful tool. I have a simple question after running a simplified version of the demo notebook.

After running b2c.bin_to_cell() on adata and creating cdata, I have noticed that all the bin information is lost in the process and the object_id is the new (and only) cell identifier. Below is a snippet from the console output:

>>> adata
AnnData object with n_obs × n_vars = 6132629 × 18823
    obs: 'in_tissue', 'array_row', 'array_col', 'n_counts', 'destripe_factor', 'n_counts_adjusted', 'labels_he', 'labels_expanded'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells'
    uns: 'spatial', 'bin2cell'
    obsm: 'spatial', 'spatial_cropped_150_buffer'
>>> cdata = b2c.bin_to_cell(adata, labels_key="labels_he", spatial_keys=["spatial", "spatial_cropped_150_buffer"])
>>> cdata
AnnData object with n_obs × n_vars = 61842 × 18823
    obs: 'object_id', 'bin_count', 'array_row', 'array_col'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells'
    uns: 'spatial'
    obsm: 'spatial', 'spatial_cropped_150_buffer'

I have several questions with respect to how b2c.bin_to_cell() is intended to work, and that may help improve the troubleshooting and results interpretation on the downstream analyses:

  1. The aggregation of counts is done based on the 'n_counts' or on the 'n_counts_adjusted'? Wouldn't it be good to keep both?
  2. Maybe I'm missing something here, but is the information with respect to the number of bins (and counts) that are aggregated into a single cell stored anywhere on cdata? I understand that this is lost in the agreggation. I would suggest to include, at least, the number of bins that were aggregated into a single cell and the mean number of transcripts that they contained.
  3. In a similar direction, what happens with the bins that are not assigned to any cell? Is that information removed from cdata? I understand it may not be something to store on cdata (and even that it may be out of the scope of this function use), but it would be very useful to be able to profile how noisy the assignation is with respect to the gene expression of those unassigned bins. I am not a developer, but maybe the expression of the extracellular matrix could be used to compute some sort of probability/ies for expression purity in the bin_to_cell calling.

Thank you in advance, looking forward to use your tool in my own samples!
Sergio

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions