-
Notifications
You must be signed in to change notification settings - Fork 3
update finalMerging example #140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances the cluster‐merging workflow by adding parallel DE computation, optional DataFrame output for marker selection, and refactors the final merging API.
- Introduces
return_markers_dfandn_jobsparameters acrossmerge_clustersandselect_marker_geneswith parallelized DE viade_pairs_ebayes_parallel. - Refactors
final_mergeto accept list‐based cluster assignments, include parallelization options, and adds a helper to convert cluster dicts to lists. - Updates example scripts and README to demonstrate the new parameters and usage.
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| transcriptomic_clustering/merging.py | Added return_markers_df/n_jobs params and updated marker‐selection logic |
| transcriptomic_clustering/markers.py | Extended select_marker_genes signature, return type, and DE parallel call |
| transcriptomic_clustering/final_merging.py | Refactored final_merge API, signature, docstring, and added helper function |
| transcriptomic_clustering/filter_known_modes.py | Changed known_modes param to lookup by column name; updated signature logic |
| transcriptomic_clustering/dimension_reduction.py | Adjusted gene count calculation; potential off‐by‐sum bug |
| transcriptomic_clustering/de_ebayes.py | Added process_pair and de_pairs_ebayes_parallel for parallel DE |
| transcriptomic_clustering/init.py | Exposed new functions and final_merge; duplicate import cleanup needed |
| examples/final_merging_example.py | New example demonstrating final_merge with parallel and marker‐DF options |
| examples/clustering_example.py | Example script for iterative clustering; missing imports cause NameError |
| README.md | Documented the examples directory |
Comments suppressed due to low confidence (5)
transcriptomic_clustering/markers.py:26
- The docstring for select_marker_genes should be updated to describe the new parameters
return_markers_dfandn_jobs, and to reflect the updated return type (Union[pd.DataFrame, set]).
"""
transcriptomic_clustering/final_merging.py:97
- The docstring for final_merge should be updated to include the added parameters
n_jobsandreturn_markers_df, and to clarify the new return type (tuple of list of lists and markers).
"""
transcriptomic_clustering/filter_known_modes.py:13
- The docstring should be updated to reflect that
known_modesis now expected to be a column name (string) inadata.obsrather than a DataFrame or Series.
known_modes: Optional[str] = None,
examples/clustering_example.py:7
- The script uses
sys.pathbutsysis not imported; addimport sysat the top of the file.
sys.path.insert(1, '/allen/programs/celltypes/workgroups/rnaseqanalysis/dyuan/tool/transcriptomic_clustering/')
examples/clustering_example.py:17
transcriptomic_clusteringmodule is not imported, so callingtranscriptomic_clustering.normalizewill raise a NameError; addimport transcriptomic_clusteringor importnormalizedirectly.
adata=transcriptomic_clustering.normalize(adata)
| vidx_bool = np.zeros((adata.n_vars,), dtype=bool) | ||
| vidx_bool[vidx] = True | ||
| n_genes = len(vidx) | ||
| n_genes = sum(vidx) |
Copilot
AI
Jul 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using sum(vidx) will sum the index positions rather than count the number of genes; consider using len(vidx) or vidx_bool.sum() to get the correct gene count.
| n_genes = sum(vidx) | |
| n_genes = len(vidx) |
| from .diff_expression import de_pairs_chisq, vec_chisq_test | ||
| from .de_ebayes import de_pairs_ebayes | ||
| from .de_ebayes import de_pairs_ebayes, de_pairs_ebayes_parallel | ||
| from .merging import merge_clusters |
Copilot
AI
Jul 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate import of merge_clusters; remove the redundant line to keep the module init.py clean.
| from .merging import merge_clusters |
No description provided.