available storage #459

PalanQu · 2025-11-02T12:21:18Z

No description provided.

blacks1ne · 2025-11-03T07:22:52Z

Adding @CassOnMars comment from TG Development channel:

The issue at hand is presently, workers aren't aware of worker-local storage vs cluster-wide storage: the worker can check the partition the storage is on, but that is shared with other workers on the machine (unless the storage itself is partitioned per store folder, which to my understanding nobody has done yet).

What I think the right path forward is, is somewhat more involved, and warrants further discussion: the idea would be either of the following:

(Cheap, easy, not great) workers use the total storage available on the partition the store is running in, divides that by the number of total workers (with some buffer room for the master process if it too lives in that partition), and then measures a ratio of the storage use as a proxy for available storage per worker
(Foundationally more correct, more complicated) Diving into OS-specific syscalls to essentially self-containerize the worker process and limit its internal view of available storage

CassOnMars · 2025-11-16T11:41:27Z

I swear I remember writing this somewhere, but I can't recall if it was TG or discord, so adding my thoughts here so they don't get lost from the topic at hand: the thing I'm not sure about with this approach, is that it still doesn't quite address the problem with clustered arrangements. Once a node's workers extend beyond the machine itself, or the node itself doesn't support the relevant syscall, it'll error out/report zero. I'm struggling to find an approach that doesn't do the more complicated path, i.e. workers actually containerize themselves, relevant syscalls to scope what they have access rights to, etc. Part of this could be alleviated by having worker groups as a configuration option, such that when a worker is determining their own available space, they know only to compare the count of workers in their own group when dividing available storage.

blacks1ne · 2025-11-17T04:28:07Z

I was more thinking about having a kind of "worker manager" (e.g. "worker supervisor") on slave nodes that would run on remote nodes supervising local worker processes: (re)starting them, exposing centralized metrics including disk space usage, etc.
That would require adding the node launch flag for the supervised core list, e.g. -cores=X-Y,Z

tjsturos · 2025-11-17T08:49:24Z

hmm, this actually ties into what I'm doing right now-- allowing a worker to register with the master node rather than needing to be defined in a config.

One of the things that I was considering is adding support for a proxy node, or a psuedo-master that relayed start/stop commands, but it could actually have additional features, like storage calculations, automatic worker generation (based on the results of the calculations).

This would make things simpler in the manner that you could avoid overages, but it still wouldn't make the worker aware of it's limitations. You'd probably need to have a hard limit passed from the proxy or a param flag on individual workers (maybe determined by a your deployment script).

PalanQu force-pushed the feat/available-storage branch 2 times, most recently from 8258b0e to f8e6d87 Compare November 2, 2025 12:39

feat: support static available storage

53b4111

PalanQu force-pushed the feat/available-storage branch from f8e6d87 to 53b4111 Compare November 2, 2025 12:40

partition reconciler

1fc37f7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

available storage #459

available storage #459

Uh oh!

PalanQu commented Nov 2, 2025

Uh oh!

blacks1ne commented Nov 3, 2025

Uh oh!

CassOnMars commented Nov 16, 2025

Uh oh!

blacks1ne commented Nov 17, 2025 •

edited

Loading

Uh oh!

tjsturos commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

available storage #459

Are you sure you want to change the base?

available storage #459

Uh oh!

Conversation

PalanQu commented Nov 2, 2025

Uh oh!

blacks1ne commented Nov 3, 2025

Uh oh!

CassOnMars commented Nov 16, 2025

Uh oh!

blacks1ne commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tjsturos commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

blacks1ne commented Nov 17, 2025 •

edited

Loading