Skip to content

Conversation

@yyuandann
Copy link

No description provided.

@yyuandann yyuandann requested review from UCDNJJ and Copilot July 16, 2025 18:44
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the cluster‐merging workflow by adding parallel DE computation, optional DataFrame output for marker selection, and refactors the final merging API.

  • Introduces return_markers_df and n_jobs parameters across merge_clusters and select_marker_genes with parallelized DE via de_pairs_ebayes_parallel.
  • Refactors final_merge to accept list‐based cluster assignments, include parallelization options, and adds a helper to convert cluster dicts to lists.
  • Updates example scripts and README to demonstrate the new parameters and usage.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
transcriptomic_clustering/merging.py Added return_markers_df/n_jobs params and updated marker‐selection logic
transcriptomic_clustering/markers.py Extended select_marker_genes signature, return type, and DE parallel call
transcriptomic_clustering/final_merging.py Refactored final_merge API, signature, docstring, and added helper function
transcriptomic_clustering/filter_known_modes.py Changed known_modes param to lookup by column name; updated signature logic
transcriptomic_clustering/dimension_reduction.py Adjusted gene count calculation; potential off‐by‐sum bug
transcriptomic_clustering/de_ebayes.py Added process_pair and de_pairs_ebayes_parallel for parallel DE
transcriptomic_clustering/init.py Exposed new functions and final_merge; duplicate import cleanup needed
examples/final_merging_example.py New example demonstrating final_merge with parallel and marker‐DF options
examples/clustering_example.py Example script for iterative clustering; missing imports cause NameError
README.md Documented the examples directory
Comments suppressed due to low confidence (5)

transcriptomic_clustering/markers.py:26

  • The docstring for select_marker_genes should be updated to describe the new parameters return_markers_df and n_jobs, and to reflect the updated return type (Union[pd.DataFrame, set]).
    """

transcriptomic_clustering/final_merging.py:97

  • The docstring for final_merge should be updated to include the added parameters n_jobs and return_markers_df, and to clarify the new return type (tuple of list of lists and markers).
    """

transcriptomic_clustering/filter_known_modes.py:13

  • The docstring should be updated to reflect that known_modes is now expected to be a column name (string) in adata.obs rather than a DataFrame or Series.
        known_modes: Optional[str] = None,

examples/clustering_example.py:7

  • The script uses sys.path but sys is not imported; add import sys at the top of the file.
sys.path.insert(1, '/allen/programs/celltypes/workgroups/rnaseqanalysis/dyuan/tool/transcriptomic_clustering/')

examples/clustering_example.py:17

  • transcriptomic_clustering module is not imported, so calling transcriptomic_clustering.normalize will raise a NameError; add import transcriptomic_clustering or import normalize directly.
adata=transcriptomic_clustering.normalize(adata)

vidx_bool = np.zeros((adata.n_vars,), dtype=bool)
vidx_bool[vidx] = True
n_genes = len(vidx)
n_genes = sum(vidx)
Copy link

Copilot AI Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using sum(vidx) will sum the index positions rather than count the number of genes; consider using len(vidx) or vidx_bool.sum() to get the correct gene count.

Suggested change
n_genes = sum(vidx)
n_genes = len(vidx)

Copilot uses AI. Check for mistakes.
from .diff_expression import de_pairs_chisq, vec_chisq_test
from .de_ebayes import de_pairs_ebayes
from .de_ebayes import de_pairs_ebayes, de_pairs_ebayes_parallel
from .merging import merge_clusters
Copy link

Copilot AI Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate import of merge_clusters; remove the redundant line to keep the module init.py clean.

Suggested change
from .merging import merge_clusters

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant