Run a real Dagger engine in the agentloop agent pods (drop the engine-less skip) #538

New Issue

2026-06-08T06:02:42Z

guettlibot commented

2026-06-08 06:02:42 +00:00

(Migrated from codeberg.org)

Goal

Run the Dagger-backed checks (dart-check → dagger call ... check-fast, and the other dagger call tasks) for real inside the agentloop agent pods, instead of skipping them. Today the agent commits in an engine-less pod, so the dart-check pre-commit hook either fails hard with:

start engine: driver for scheme "image" was not available

or is silently skipped by scripts/precommit_dart_check.sh. The skip is a fallback we want to remove — the agent should get the same validation a developer or CI gets, locally, before it pushes.

What is blocking (diagnosis)

Dagger needs an engine to talk to. The CLI finds one in exactly one of two ways: provision it from a local container runtime (docker/podman/nerdctl), or connect to an existing one via _EXPERIMENTAL_DAGGER_RUNNER_HOST. In the agentloop pod neither exists:

No container runtime in the pod. command -v docker / podman → nothing. The node runs containerd (k3s), but there is no Docker socket mounted and no rootless runtime in the image.
_EXPERIMENTAL_DAGGER_RUNNER_HOST is unset, so the CLI falls back to its default engine-image reference and aborts with the driver for scheme "image" error.
The pod is unprivileged. securityContext is just runAsUser: 1000 / fsGroup: 1000. The Dagger engine (buildkit) needs privileged: true (CAP_SYS_ADMIN, mounts), so it can't run inside the agent container — it has to be a separate, privileged container.
Version skew. The CLI is pinned to 0.21.4 (flake.nix override of the dagger/nix 0.20.8), but ci/dagger.json declares engineVersion: v0.20.8. Whatever engine we stand up must match the CLI minor version, i.e. v0.21.4.

So nothing is wrong with the Dagger code — the execution context simply has no engine and no way to reach one.

Plan

Primary approach: Dagger engine sidecar in the agent pod

agentloop passes the operator-supplied PodSpec through verbatim (internal/k8s/job.go, dispatcher.go), and the inline daemon's spec lives in the gitops manifests. So we add the engine at the manifest level — no agentloop code change.

Add a privileged dagger-engine sidecar to the agent execution pod (the agentloop Deployment today; the worker-Pod template once k8s worker mode is in use):
- image registry.dagger.io/engine:v0.21.4 (match the CLI),
- securityContext.privileged: true,
- a shared emptyDir mounted at the engine socket dir in both the engine and agent containers,
- a persistent cache volume for /var/lib/dagger (hostPath on this single node, or a PVC) so the buildkit cache survives pod restarts and check-fast isn't cold every time.
Point the agent container at it: set _EXPERIMENTAL_DAGGER_RUNNER_HOST=unix:///run/dagger/engine.sock (the shared socket) on the agentloop/worker container env.
Confirm the namespace allows privileged pods (no PodSecurity restricted enforcement on agentloop — verify before relying on it).

Alternative (if we'd rather not run a privileged sidecar per pod): a single shared, privileged dagger-engine Deployment with a cache PVC, reached via _EXPERIMENTAL_DAGGER_RUNNER_HOST=kube-pod://... (requires kubectl + RBAC in the agent image, which we don't have today) or tcp:// (unauthenticated — must be locked down with a NetworkPolicy). The sidecar is preferred because it's self-contained and needs no extra RBAC/networking.

sharedinbox-side changes (this repo)

Align the engine version: make ci/dagger.json's engineVersion compatible with the running engine + CLI 0.21.4 (bump to v0.21.4, or re-pin the CLI — pick one and keep flake.nix, the engine image, and dagger.json in lockstep).
Remove the skip fallback in scripts/precommit_dart_check.sh. Once an engine is reliably present, drop the silent exit 0. If we want a guard at all, invert it so a missing engine is a hard, loud failure in the agent context (so an engine regression is caught immediately) rather than a silent skip.
Make sure the dev shell / hook does not clobber the injected _EXPERIMENTAL_DAGGER_RUNNER_HOST (cf. the recent "drop dead DAGGER_HOST export" cleanup) — the pod env must win.

Cross-repo dependency

Steps 1–3 are infra and land in the agentloop deployment manifests (guettli/gitops + the agentloop runtime image / worker PodSpec template), not in this repo. This issue tracks the goal and the sharedinbox-side changes (4–6); the manifest work should be linked from here.

Acceptance criteria

A git commit made by the agent inside an agentloop pod runs dart-check and the Dagger engine actually executes check-fast — passing or failing on real results, with no driver for scheme "image" error and no "skipping dart-check" warning.
The other dagger call tasks (analyze, format-write, test-*, …) can run in the same context.
The engine version, CLI version, and ci/dagger.json engineVersion are mutually compatible.
The engine-less skip path is removed (or inverted to a hard failure).

Context

Surfaced while fixing the agentloop plan-commit flow (the plan bookkeeping commit was wrongly running this project's pre-commit hooks; fixed in guettli/agentloop#173 with --no-verify). That unblocked planning, but the underlying inability to run Dagger in-pod remains and is what this issue addresses.

## Goal Run the Dagger-backed checks (`dart-check` → `dagger call ... check-fast`, and the other `dagger call` tasks) **for real inside the agentloop agent pods**, instead of skipping them. Today the agent commits in an engine-less pod, so the `dart-check` pre-commit hook either fails hard with: ``` start engine: driver for scheme "image" was not available ``` or is silently skipped by `scripts/precommit_dart_check.sh`. The skip is a fallback we want to remove — the agent should get the same validation a developer or CI gets, locally, before it pushes. ## What is blocking (diagnosis) Dagger needs an **engine** to talk to. The CLI finds one in exactly one of two ways: provision it from a local container runtime (docker/podman/nerdctl), or connect to an existing one via `_EXPERIMENTAL_DAGGER_RUNNER_HOST`. In the agentloop pod **neither exists**: - **No container runtime in the pod.** `command -v docker` / `podman` → nothing. The node runs containerd (k3s), but there is no Docker socket mounted and no rootless runtime in the image. - **`_EXPERIMENTAL_DAGGER_RUNNER_HOST` is unset**, so the CLI falls back to its default engine-image reference and aborts with the `driver for scheme "image"` error. - **The pod is unprivileged.** `securityContext` is just `runAsUser: 1000` / `fsGroup: 1000`. The Dagger engine (buildkit) needs `privileged: true` (CAP_SYS_ADMIN, mounts), so it can't run inside the agent container — it has to be a separate, privileged container. - **Version skew.** The CLI is pinned to `0.21.4` (flake.nix override of the dagger/nix `0.20.8`), but `ci/dagger.json` declares `engineVersion: v0.20.8`. Whatever engine we stand up must match the **CLI** minor version, i.e. `v0.21.4`. So nothing is wrong with the Dagger code — the execution context simply has no engine and no way to reach one. ## Plan ### Primary approach: Dagger engine sidecar in the agent pod agentloop passes the **operator-supplied PodSpec through verbatim** (`internal/k8s/job.go`, `dispatcher.go`), and the inline daemon's spec lives in the gitops manifests. So we add the engine at the manifest level — no agentloop code change. 1. **Add a privileged `dagger-engine` sidecar** to the agent execution pod (the `agentloop` Deployment today; the worker-Pod template once k8s worker mode is in use): - image `registry.dagger.io/engine:v0.21.4` (match the CLI), - `securityContext.privileged: true`, - a shared `emptyDir` mounted at the engine socket dir in both the engine and agent containers, - a persistent cache volume for `/var/lib/dagger` (hostPath on this single node, or a PVC) so the buildkit cache survives pod restarts and check-fast isn't cold every time. 2. **Point the agent container at it:** set `_EXPERIMENTAL_DAGGER_RUNNER_HOST=unix:///run/dagger/engine.sock` (the shared socket) on the agentloop/worker container env. 3. **Confirm the namespace allows privileged pods** (no PodSecurity `restricted` enforcement on `agentloop` — verify before relying on it). *Alternative (if we'd rather not run a privileged sidecar per pod):* a single shared, privileged `dagger-engine` Deployment with a cache PVC, reached via `_EXPERIMENTAL_DAGGER_RUNNER_HOST=kube-pod://...` (requires `kubectl` + RBAC in the agent image, which we don't have today) or `tcp://` (unauthenticated — must be locked down with a NetworkPolicy). The sidecar is preferred because it's self-contained and needs no extra RBAC/networking. ### sharedinbox-side changes (this repo) 4. **Align the engine version:** make `ci/dagger.json`'s `engineVersion` compatible with the running engine + CLI `0.21.4` (bump to `v0.21.4`, or re-pin the CLI — pick one and keep flake.nix, the engine image, and `dagger.json` in lockstep). 5. **Remove the skip fallback** in `scripts/precommit_dart_check.sh`. Once an engine is reliably present, drop the silent `exit 0`. If we want a guard at all, invert it so a *missing* engine is a **hard, loud failure** in the agent context (so an engine regression is caught immediately) rather than a silent skip. 6. Make sure the dev shell / hook does **not** clobber the injected `_EXPERIMENTAL_DAGGER_RUNNER_HOST` (cf. the recent "drop dead DAGGER_HOST export" cleanup) — the pod env must win. ### Cross-repo dependency Steps 1–3 are infra and land in the agentloop deployment manifests (`guettli/gitops` + the `agentloop` runtime image / worker PodSpec template), not in this repo. This issue tracks the goal and the sharedinbox-side changes (4–6); the manifest work should be linked from here. ## Acceptance criteria - A `git commit` made by the agent inside an agentloop pod runs `dart-check` and the Dagger engine **actually executes** `check-fast` — passing or failing on real results, with **no** `driver for scheme "image"` error and **no** "skipping dart-check" warning. - The other `dagger call` tasks (`analyze`, `format-write`, `test-*`, …) can run in the same context. - The engine version, CLI version, and `ci/dagger.json` `engineVersion` are mutually compatible. - The engine-less skip path is removed (or inverted to a hard failure). ## Context Surfaced while fixing the agentloop plan-commit flow (the plan bookkeeping commit was wrongly running this project's pre-commit hooks; fixed in guettli/agentloop#173 with `--no-verify`). That unblocked planning, but the underlying inability to run Dagger in-pod remains and is what this issue addresses.

guettli commented

2026-06-08 07:11:02 +00:00

(Migrated from codeberg.org)

No, I will make remote Dagger available in pods of agents.

Sign in to join this conversation.

Branches Tags

main

issue-563-agentloop-validation

dummy-pr-test

issue-560-fix-firebase-run-url

issue-539-stable-imap-uid

issue-533-shared-email-list

plan-issue-555

drop-nix

plan-issue-484

plan-issue-539

plan-issue-535

plan-issue-474

plan-issue-533

fix-dagger-engineless-precommit

issue-521-fix-deploy-yml-wait-time-api

issue-502-fix-email-id-collision-mailbox

issue-492-eliminate-duplicate-build-runner

issue-494-website-change-detection

issue-491-parallelize-check

issue-478-fix-stalwart-dual-stack-bind

issue-475-allowed-addresses-glob

issue-473-search-result-reorder

issue-453-update-agentloop-defaults

issue-466-structured-search

issue-505-exclude-chaos-monkey-from-regular-ci

issue-509-fix-search-result-sorting

fix-ink-sparkle-remaining-tests

issue-506-fix-search-emails-tests

issue-504-runner-wait-time

issue-488-search-notes

issue-472-changelog-issue-links

issue-501-folder-search-local-sqlite

issue-486-fix-stale-test-shader-mismatch

fix/prevent-settled-search-rerun-473

issue-467-fix-search-stale-results

issue-446-installed-versions-in-changelog

issue-462-fix-pr

issue-448-chaos-monkey-test

issue-436-notes-on-emails

issue-429-unify-mail-display

issue-422-move-to-folder-create-new

issue-414-ensure-not-run-as-root

issue-424-unify-email-list-views

issue-419-trusted-senders-page

issue-425-fix-prs

test-foo

issue-421-bug-report

issue-383-fix-ci

issue-394-fix-deploy-flutter-version

issue-391-fix-ci-double-trigger

issue-376-combined-inbox-v2

issue-376-combined-inbox

issue-384-fix-open-prs

sops-migrate

issue-339-safe-first-on-imap-fetch

issue-340-try-catch-measure-height

issue-342-pin-intl-version

issue-341-guard-threademails-last

issue-335-agentloop-code-test

issue-329-fix

issue-315-fix

issue-320-fix

issue-325-fix

issue-312-fix

issue-311-fix

issue-305-fix

issue-304-fix

issue-299-fix

issue-300-fix

issue-298-fix

issue-296-fix

issue-294-fix

issue-289-fix

issue-288-fix

issue-287-fix

issue-286-fix

issue-277-fix

issue-282-fix

issue-280-fix

issue-272-fix

issue-268-fix

issue-267-fix

issue-266-fix

issue-258-fix

issue-260-fix

issue-257-fix

issue-253-fix

issue-216-fix

issue-251-fix

issue-249-fix

issue-question-fixes

issue-235-fix

issue-236-fix-v2

issue-237-fix

issue-236-fix

issue-228-fix

issue-217-fix

issue-214-fix

issue-213-fix

issue-208-fix

issue-205-fix

issue-204-fix

issue-203-fix

issue-202-fix

issue-129-fix

issue-161-fix

issue-160-fix

issue-201-fix

issue-210-fix

issue-198-fix

issue-200-fix

issue-144-fix

issue-199-fix

fix/playstore-upload-use-requests

issue-193-fix

issue-186-fix

issue-185-fix

issue-192-fix

issue-183-fix

issue-175-fix

issue-172-fix

issue-171-fix

issue-167-fix

issue-136-fix

issue-162-fix

issue-179-fix

issue-155-fix

issue-154-fix

issue-152-fix

issue-151-fix

issue-141-fix

issue-150-fix

issue-164-fix

migrate-to-dagger

task/d1-ci-matrix

task/a4-typeconverter-json

task/u7-onboarding-walkthrough

task/d3-sync-doc

task/a5-layer-boundary-lint

task/t5-golden-tests

task/p5-date-cache

task/s4-link-handling

task/p3-html-parse-isolate

task/u8-mark-all-read

task/u3-recent-searches

task/a3-jmap-injectable-http-client

task/r5-tls-error-handling

fix/playstore-redirect-retry

task/t3-repository-contract-tests

task/p2-email-list-pagination

task/p1-fts5-search

fix/playstore-upload-timeout

task/a1-email-detail-notifier

fix/upgrade-workmanager-0.9

fix/android-core-library-desugaring

task/p4-db-indexes

task/r3-html-error-boundary

task/d2-check-coverage

task/a2-email-tile

task/t4-migration-tests

task/t2-widget-tests

task/t1-email-repo-coverage

task/u6-connection-status

task/u4-push-notifications

task/u2-draft-sync

task/u1-list-unsubscribe

task/s2-hostname-validation

task/r6-reliability-fuzz-tests

task/r4-sync-error-banner

task/r2-force-resync

task/r1-undo-history-persistence

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: guettli/sharedinbox#538