-
Notifications
You must be signed in to change notification settings - Fork 153
available storage #459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
available storage #459
Conversation
8258b0e to
f8e6d87
Compare
f8e6d87 to
53b4111
Compare
|
Adding @CassOnMars comment from TG Development channel: The issue at hand is presently, workers aren't aware of worker-local storage vs cluster-wide storage: the worker can check the partition the storage is on, but that is shared with other workers on the machine (unless the storage itself is partitioned per store folder, which to my understanding nobody has done yet). What I think the right path forward is, is somewhat more involved, and warrants further discussion: the idea would be either of the following:
|
|
I swear I remember writing this somewhere, but I can't recall if it was TG or discord, so adding my thoughts here so they don't get lost from the topic at hand: the thing I'm not sure about with this approach, is that it still doesn't quite address the problem with clustered arrangements. Once a node's workers extend beyond the machine itself, or the node itself doesn't support the relevant syscall, it'll error out/report zero. I'm struggling to find an approach that doesn't do the more complicated path, i.e. workers actually containerize themselves, relevant syscalls to scope what they have access rights to, etc. Part of this could be alleviated by having worker groups as a configuration option, such that when a worker is determining their own available space, they know only to compare the count of workers in their own group when dividing available storage. |
|
I was more thinking about having a kind of "worker manager" (e.g. "worker supervisor") on slave nodes that would run on remote nodes supervising local worker processes: (re)starting them, exposing centralized metrics including disk space usage, etc. |
|
hmm, this actually ties into what I'm doing right now-- allowing a worker to register with the master node rather than needing to be defined in a config. One of the things that I was considering is adding support for a proxy node, or a psuedo-master that relayed start/stop commands, but it could actually have additional features, like storage calculations, automatic worker generation (based on the results of the calculations). This would make things simpler in the manner that you could avoid overages, but it still wouldn't make the worker aware of it's limitations. You'd probably need to have a hard limit passed from the proxy or a param flag on individual workers (maybe determined by a your deployment script). |
No description provided.