Skip to content

Conversation

@lzrd
Copy link

@lzrd lzrd commented Dec 11, 2025

sp-runner from sp-tools is designed to run in a persistent, stateful environment. This PR enables buildomat to work with persistent agents.

In this and the PRs leading up to it, tests exist to ensure that existing agents will continue to work as before without changing APIs, file formats, or protocols.

This PR allows agents to be configured to run as a root or non-root user and to anchor directories and files at places other than the previous fixed paths.

lzrd and others added 20 commits December 9, 2025 12:21
Mostly mechanical fixes: removing needless borrows, adding lifetime
annotations, using modern numeric constants (i32::MAX), and suppressing
warnings for dropshot-generated dead code.

Notable equivalences:
- .as_bytes().len() to .len() on String: both return byte count
- .trim().split_whitespace() to .split_whitespace(): split_whitespace
  already skips leading/trailing whitespace
- .into_iter() to .iter() on slices: both yield &T
Co-authored-by: Joshua M. Clulow <jmc@oxide.computer>
Co-authored-by: Joshua M. Clulow <jmc@oxide.computer>
Co-authored-by: Joshua M. Clulow <jmc@oxide.computer>
Co-authored-by: Joshua M. Clulow <jmc@oxide.computer>
Co-authored-by: Joshua M. Clulow <jmc@oxide.computer>
Co-authored-by: Joshua M. Clulow <jmc@oxide.computer>
Tests in github/testdata and github/server were using
buildomat_github_database::types::JobFile which lacks the
parse_content_at_path method. Switch to buildomat_jobsh::jobfile::JobFile
which has the parsing functionality needed by the tests.
jobsh/src/jobfile.rs:
  Add 16 tests for JobFileSet and JobFile parsing. The JobFileSet
  struct has complex logic for duplicate name detection, dependency
  cycle detection, and reference validation that was previously
  untested.

github/database/src/tables/check_run.rs:
  Add 4 tests for CheckRun::get_dependencies(). This method
  deserializes JSON dependency data and was not covered by existing
  enum serialization tests.

github/database/src/tables/user.rs:
  Add 2 tests for UserType::from_github_str(). The error path for
  invalid GitHub API strings was untested.

types/src/config.rs:
  Add 2 tests for ConfigFileDiag inheritance. The explicit
  "inherit: true" case and mixed inheritance scenarios were not
  covered by existing tests.
Add tests that verify the current values of the hardcoded path constants:
- CONFIG_PATH: /opt/buildomat/etc/agent.json
- JOB_PATH: /opt/buildomat/etc/job.json
- AGENT: /opt/buildomat/lib/agent

Also adds illumos-specific tests for:
- METHOD: /opt/buildomat/lib/start.sh
- MANIFEST: /var/svc/manifest/site/buildomat-agent.xml

These tests document the current behavior and will serve as a baseline
for backward compatibility when these constants are refactored to
functions in subsequent branches.
Add test that verifies the current value of SOCKET_PATH constant:
- SOCKET_PATH: /var/run/buildomat.sock

This test documents the current behavior and will serve as a baseline
for backward compatibility when this constant is refactored to a
function in subsequent branches.
Replace CONFIG_PATH, JOB_PATH, and AGENT constants with config_path(),
job_path(), and agent_path() functions that return PathBuf. This is a
pure refactoring with no behavior change - the functions return exactly
the same values as the original constants.

This prepares for future configurability via environment variables.
Replace SOCKET_PATH constant with socket_path() function. The function
returns String and is called from both mod.rs (client connection) and
server.rs (listener binding).

This is a pure refactoring with no behavior change - the function
returns exactly the same value as the original constant.
Config files are backward compatible through an optional file version
field and use of accessor functions.
The new fields use serde defaults so existing config files continue to work.

New fields for persistent mode support:
- version: config file version (defaults to 1 for backward compatibility)
- opt_base: optional base directory (defaults to `/opt/buildomat`)
- persistent: flag for persistent/stateful mode vs the existing ephemeral agents.
These methods will replace the standalone path functions, allowing
paths to be derived from the config file rather than hardcoded.

- opt_base(): returns base directory (default `/opt/buildomat`)
- job_path(): path to job state file
- job_done_path(): path to job completion marker
- agent_path(): path to agent binary
- control_program_path(): /usr/bin/bmat or $OPT/bin/bmat in persistent mode
- socket_path(): `/var/run/buildomat.sock` or `$OPT/run/buildomat.sock`
- method_path(): SMF method script path (illumos only)
- mark_job_done(): renames job.json to job-done.json on completion
Add command line support for persistent mode configuration:
- Add config_path field to Agent struct
- Add DEFAULT_CONFIG_PATH constant
- Add parse_config_path_arg() to parse -c flag before hiercmd
- Add --persistent and --opt-base flags to install subcommand
- Update cmd_install to use config_path from context

CLI usage:
```
  agent -c /path/to/config.json install --persistent --opt-base /home/user/buildomat BASEURL TOKEN
````
Use ConfigFile methods for paths in cmd_install:
- cf.agent_path() instead of standalone agent_path()
- cf.control_program_path() for bmat installation
- cf.method_path() for SMF method script

Skip SMF/systemd/ZFS setup in persistent mode:
- illumos: skip SMF manifest, method script, zfs create, svccfg import
- linux: skip input dir creation, systemd unit, daemon-reload, service start

Remove now-unused agent_path() function and METHOD constant.
Server side:
- listen() now takes socket_path parameter instead of calling socket_path()
- Creates parent directory if needed (for persistent mode custom paths)
- cmd_run passes cf.socket_path() to listen()

Client side (bmat):
- socket_path() reads BUILDOMAT_SOCKET env var if set
- Defaults to /var/run/buildomat.sock if not set
- Agent sets BUILDOMAT_SOCKET when running jobs (next commit)

This allows the socket path to vary based on persistent mode config
while keeping bmat able to find the socket via env var.
Use ConfigFile methods throughout cmd_run for all path computation.
Remove standalone config_path() and job_path() functions.

Add task environment fields to ClientWrap:
- persistent: whether running in persistent mode
- socket_path: for BUILDOMAT_SOCKET env var
- task_path: PATH with $opt_base/bin prepended in persistent mode
- task_home: HOME env var value (current user's $HOME or /root)
- task_user: USER/LOGNAME env var value
- task_cwd: working directory for diagnostic scripts

Update start_diag_script() and task execution to use these precomputed
values instead of duplicating if/else logic. In persistent mode:
- HOME/USER/LOGNAME use current user's environment
- PATH includes $opt_base/bin so bmat is discoverable
- Skip uid/gid setting (runs as current user)
- Set BUILDOMAT_SOCKET for jobs to find control socket

Call cf.mark_job_done() at all job completion points to rename
job.json to job-done.json. This rename is compatible with ephemeral instances
but not needed there. For a persistent instance, it is the most basic, but not
complete, cleanup.
@lzrd lzrd marked this pull request as draft December 11, 2025 02:27
@jclulow
Copy link
Collaborator

jclulow commented Dec 11, 2025

As an FYI, I have just had a quick look and I think I already made some prototype changes in the hubris-ci branch for unprivileged agents, which it'd be good to look at too: main...hubris-ci#diff-84941b7b66007f0c958f880e753558618813c2a5b5fb2e3d77e2b494efa4f2f2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants