-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Description
Hi,
I've been experimenting with DDS (I run the up-to-date master branch of DDS) on Virgo and encountered two issues:
- When my custom batch script configuration contained
#SBATCH --mem-per-cpu, I got the following errorsrun: fatal: cpus_per_task set by two different environment variables SLURM_CPUS_PER_TASK=2 != SLURM_TRES_PER_TASK=cpu:1
This can be fixed by explicitly stating--cpus-per-taskinsruninvocation and I believe is connected with the change in slurm behavior described here: https://docs.icer.msu.edu/2023-05-04_LabNotebook_srun_threading_changes/ - The second one I don't understand, but the command passed to
srunviabash -cwas not being executed. It's somehow being fixed by dumping the script into a file and passing it tobash -c.
Here's the diff with changes I made in job.slurm.in file:
@@ -16,8 +16,10 @@
# continue waiting for child processes by any means
trap -- '' SIGINT SIGTERM
+echo 'trap '"'"'kill $PID && wait'"'"' SIGINT SIGTERM; eval JOB_WRK_DIR=%DDS_AGENT_ROOT_WRK_DIR%/${SLURM_JOB_NAME}_${SLURM_JOBID}_${SLURMD_NODENAME}; mkdir -p $JOB_WRK_DIR; cd $JOB_WRK_DIR; cp %DDS_SCOUT% $JOB_WRK_DIR/; ./DDSWorker.sh & PID=$!; wait' > srun_script.sh
+
# execute DDS Scoullt
-srun --no-kill --kill-on-bad-exit=0 --output=slurm-%j-%N.out /usr/bin/env bash -c 'trap '"'"'kill $PID && wait'"'"' SIGINT SIGTERM; eval JOB_WRK_DIR=%DDS_AGENT_ROOT_WRK_DIR%/${SLURM_JOB_NAME}_${SLURM_JOBID}_${SLURMD_NODENAME}; mkdir -p $JOB_WRK_DIR; cd $JOB_WRK_DIR; cp %DDS_SCOUT% $JOB_WRK_DIR/; ./DDSWorker.sh & PID=$!; wait' &
+srun --cpus-per-task $SLURM_CPUS_PER_TASK --no-kill --kill-on-bad-exit=0 --output=slurm-%j-%N.out /usr/bin/env bash 'srun_script.sh' &
waitBTW, is there an intended way to reserve >1 cpu per slot for multithreaded tasks?
Regards,
Bartosz
Metadata
Metadata
Assignees
Labels
No labels