-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
Right now the memory and cores in the database are a bit arbitrary - luckily, we have all the data to reason about better mem values (and cores should probably typically be mem/4 as discussed in #60 except in cases where tools use little memory but see significant speedup with more cores).
If we all pushed our memory usage and input sizes to a centralized database we could both visualize it (similar to how I have done it one-off in this gist) and hopefully automatically make some decisions about memory values in the shared DB.
However there are some things for consideration by people who are good at statistics:
- Cutoffs for high and low memory usage (or just use 95th percentile?) since there are outliers
- Cutoffs for high and low size inputs, since there is usually a lower bound on memory that does not correlate to inputs at all
- Input compression - the mixture of compressed and uncompressed data makes input sizes as a ratio to memory usage kind of a lie
- The current memory limit can arbitrarily cut off what would be valid successful jobs and thus skew the data, and this varies by server, although we do know what the limit was for each job
- How input size affects memory usage, and of course it is rarely just input size, but also the actual data itself
- How recent of jobs to consider, since newer data is typically more useful than older data
- Tool versions to consider, since this can drastically effect memory usage, but we also don't necessarily want each
+galaxyNversion to be separate
sanjaysrikakulam
Metadata
Metadata
Assignees
Labels
No labels