-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Describe the bug
Feature selection step seems to be missing.
task_batch_integration/src/methods/scprint/script.py
Lines 47 to 57 in 4e9ebb6
| print("\n>>> Preprocessing data...", flush=True) | |
| preprocessor = Preprocessor( | |
| min_valid_genes_id=min(0.9 * adata.n_vars, 10000), # 90% of features up to 10,000 | |
| # Turn off cell filtering to return results for all cells | |
| filter_cell_by_counts=False, | |
| min_nnz_genes=False, | |
| do_postp=False, | |
| # Skip ontology checks | |
| skip_validate=True, | |
| ) | |
| adata = preprocessor(adata) |
Expected behavior
Feature selection should be handled via command line argument
task_batch_integration/src/methods/scvi/config.vsh.yaml
Lines 18 to 21 in 4e9ebb6
| - name: --n_hvg | |
| type: integer | |
| default: 2000 | |
| description: Number of highly variable genes to use. |
task_batch_integration/src/methods/scvi/script.py
Lines 32 to 35 in 4e9ebb6
| if par["n_hvg"]: | |
| print(f"Select top {par['n_hvg']} high variable genes", flush=True) | |
| idx = adata.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]] | |
| adata = adata[:, idx].copy() |
Additional context
If full feature matrix for the model is desired by default, this should be achieved by adjusting the --n_hvg parameter in, or adding new variant here:
task_batch_integration/src/methods/scprint/config.vsh.yaml
Lines 34 to 40 in 4e9ebb6
| variants: | |
| scprint_large: | |
| model_name: "large" | |
| scprint_medium: | |
| model_name: "v2-medium" | |
| scprint_small: | |
| model_name: "small" |