autonice is a load-balancing script for users of the Slurm workload
manager.
autonice runs in the background and periodically checks which
proportion of the available computational resources the user is using.
If it is more than their fair share, their array jobs are deprioritized.
If it is less than their fair share, they are prioritized. Non-array
jobs are always ignored by autonice.
Priorities are adjusted by setting nice values, so autonice will
overwrite all manually set nice values.
The current version of autonice is hard-coded for use in the infai_1
, infai_2 and infai_3 partitions of the Slurm instance at the
University of Basel. For autonice to work effectively, every user must
run autonice for all partitions they use.
Start autonice in the background with:
nohup ./autonice.py [--log-file LOG_FILE] <partition> &replacing <partition> with a Slurm partition, such as infai_1,
infai_2 and infai_3. autonice will keep running after you log out.
For multiple partitions, use multiple invocations of autonice.
Running autonice under nohup redirects all output to a file called
nohup.out, which you can safely delete when autonice is not
running. Deleting the file while autonice is running will likely lead
to quirky behavior of the file system.
nohup ./autonice.py --log-file ~/autonice_infai_1.log infai_1 &
nohup ./autonice.py --log-file ~/autonice_infai_2.log infai_2 &
nohup ./autonice.py --log-file ~/autonice_infai_3.log infai_3 &
exitTo stop autonice, for example before a restart, a brute-force
method is
killall pythonbut of course this is not advisable if you run other Python processes. A safer alternative is
ps x | grep autoniceand then kill just the relevant process IDs.
You can set the amount of memory sbatch allocates to each core with
the --mem-per-cpu option. (Note that cpu refers to a core/processor
in Slurm parameter strings.) Autonice assumes that you only allocate as
much memory as is available per core. This is 3872M on infai_1,
6354M on infai_2 and 4028M on infai_3. If you need more memory,
you must allocate multiple cores to each task by using the
--cpus-per-task option.
To let Slurm run job 123 before job 456, you can use the command
scontrol update dependency=123 jobid=456. This is useful to order your
own jobs, but also if you want to let someone else's jobs finish before
yours start.
Many. The current version of autonice is very much a prototype.
See notes.org for some information on known limitations, RFEs etc.
- Ignore non-array jobs.
- Count the number of used cores instead of running jobs.
- Use more robust format string for obtaining pending jobs.
- Port code from Bash to Python.
- Ignore non-array jobs.
- Log file configurable (now defaults to stdout).
- First public release of
autonice.
autonice is free software: you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation, either version 3 of the License, or (at your
option) any later version.
autonice is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.