-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Describe the bug
Final step of the pipeline is failing for some parameter reason as described in this stackoverflow issue:
To Reproduce
I've cloned the repo and changed some of the configs to run in a slurm context with no internet access. Everthing is creates and analyzed as expected but the final file.
sbatch -p ei-cb -J predector_test -o predector_test.%j.log -c 1 --mem 10G --wrap " source nextflow-22.04.0_CBG && nextflow run ~/singularity/predector/predector/main.nf --phibase /ei/cb/common/Databases/predector/phi-base_current.fas --pfam_hmm /ei/cb/common/Databases/predector/Pfam-A.hmm.gz --pfam_dat /ei/cb/common/Databases/predector/Pfam-A.hmm.dat.gz --dbcan /ei/cb/common/Databases/predector/dbCAN-HMMdb-V11.txt --effectordb /ei/cb/common/Databases/predector/effectordb.hmm.gz -profile test -with-singularity ~/singularity/predector/predector-1.2.7.sif -resume ~/singularity/predector/predector/ -c ~/singularity/predector/predector/nextflow.config -with-report"
Expected behavior
Expeceted to get the *rank_result.tsv file of the test
Error Log
Error executing process > 'rank_results (test_set)'
Caused by:
Process rank_results (test_set) terminated with an error exit status (2)
Command executed:
predutils load_db --mem "2" tmp.db results.ldjson
predutils rank --mem "2" --dbcan dbcan.txt --pfam pfam.txt --outfile "test_set-ranked.tsv" --secreted-weight "2" --sigpep-good-weight "0.003" --sigpep-ok-weight "0.0001" --single-transmembrane-weight "-0.7" --multiple-transmembrane-weight "-1.0" --deeploc-extracellular-weight "1.3" --deeploc-intracellular-weight "-1.3" --deeploc-membrane-weight "-0.25" --targetp-mitochondrial-weight "-0.5" --effectorp1-weight "0.5" --effectorp2-weight "2.5" --effectorp3-apoplastic-weight "0.5" --effectorp3-cytoplasmic-weight "0.5" --effectorp3-noneffector-weight "-2.5" --deepredeff-fungi-weight "0.1" --deepredeff-oomycete-weight "0.0" --effector-homology-weight "2" --virulence-homology-weight "0.5" --lethal-homology-weight "-2" --tmhmm-first-60-threshold "10" tmp.db
rm -f tmp.db
Command exit status:
2
Command output:
(empty)
DataFrame.dtypes for data must be int, float, bool or category. When
categorical type is supplied, DMatrix parameter enable_categorical must
be set to True. Invalid columns:signalp3_nn_d
Traceback (most recent call last):
File "/opt/conda/envs/predector/lib/python3.9/site-packages/predectorutils/main.py", line 253, in main
rank_runner(args)
File "/opt/conda/envs/predector/lib/python3.9/site-packages/predectorutils/subcommands/rank.py", line 1577, in runner
raise e
File "/opt/conda/envs/predector/lib/python3.9/site-packages/predectorutils/subcommands/rank.py", line 1575, in runner
inner(con, cur, args)
File "/opt/conda/envs/predector/lib/python3.9/site-packages/predectorutils/subcommands/rank.py", line 1561, in inner
df["effector_score"] = run_ltr(df)
File "/opt/conda/envs/predector/lib/python3.9/site-packages/predectorutils/subcommands/rank.py", line 1503, in run_ltr
dmat = xgb.DMatrix(df_features)
File "/opt/conda/envs/predector/lib/python3.9/site-packages/xgboost/core.py", line 532, in inner_f
return f(**kwargs)
File "/opt/conda/envs/predector/lib/python3.9/site-packages/xgboost/core.py", line 643, in init
handle, feature_names, feature_types = dispatch_data_backend(
File "/opt/conda/envs/predector/lib/python3.9/site-packages/xgboost/data.py", line 896, in dispatch_data_backend
return _from_pandas_df(data, enable_categorical, missing, threads,
File "/opt/conda/envs/predector/lib/python3.9/site-packages/xgboost/data.py", line 345, in _from_pandas_df
data, feature_names, feature_types = _transform_pandas_df(
File "/opt/conda/envs/predector/lib/python3.9/site-packages/xgboost/data.py", line 283, in _transform_pandas_df
_invalid_dataframe_dtype(data)
File "/opt/conda/envs/predector/lib/python3.9/site-packages/xgboost/data.py", line 247, in _invalid_dataframe_dtype
raise ValueError(msg)
ValueError: DataFrame.dtypes for data must be int, float, bool or category. When
categorical type is supplied, DMatrix parameter enable_categorical must
be set to True. Invalid columns:signalp3_nn_d
Operating system (please enter the following information as appropriate):
- OS/Linux distribution: CentOS
- Dependency management: Singularity
- Linux HPC
Additional context
I think changin the xgb.DMatrix(df_features) to xgb.DMatrix(df_features, enable_categorical=True) shoould do the fix.