Inconsistent preprocessing for scPRINT

**Describe the bug**
Feature selection step seems to be missing.

https://github.com/openproblems-bio/task_batch_integration/blob/4e9ebb65fca55d68c92b00d8946e3e44fa63a8b1/src/methods/scprint/script.py#L47-L57

**Expected behavior**
Feature selection should be handled via command line argument

https://github.com/openproblems-bio/task_batch_integration/blob/4e9ebb65fca55d68c92b00d8946e3e44fa63a8b1/src/methods/scvi/config.vsh.yaml#L18-L21

https://github.com/openproblems-bio/task_batch_integration/blob/4e9ebb65fca55d68c92b00d8946e3e44fa63a8b1/src/methods/scvi/script.py#L32-L35


**Additional context**
If full feature matrix for the model is desired by default, this should be achieved by adjusting the `--n_hvg` parameter in, or adding new variant here:

https://github.com/openproblems-bio/task_batch_integration/blob/4e9ebb65fca55d68c92b00d8946e3e44fa63a8b1/src/methods/scprint/config.vsh.yaml#L34-L40

	print("\n>>> Preprocessing data...", flush=True)
	preprocessor = Preprocessor(
	min_valid_genes_id=min(0.9 * adata.n_vars, 10000), # 90% of features up to 10,000
	# Turn off cell filtering to return results for all cells
	filter_cell_by_counts=False,
	min_nnz_genes=False,
	do_postp=False,
	# Skip ontology checks
	skip_validate=True,
	)
	adata = preprocessor(adata)

	- name: --n_hvg
	type: integer
	default: 2000
	description: Number of highly variable genes to use.

	if par["n_hvg"]:
	print(f"Select top {par['n_hvg']} high variable genes", flush=True)
	idx = adata.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
	adata = adata[:, idx].copy()

	variants:
	scprint_large:
	model_name: "large"
	scprint_medium:
	model_name: "v2-medium"
	scprint_small:
	model_name: "small"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inconsistent preprocessing for scPRINT #81

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsistent preprocessing for scPRINT #81

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions