Adds the Dagger gRPC/HTTP disconnect error to the retry pattern
so transient engine drops during long-running steps (like build_runner)
auto-recover instead of failing the CI job.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The DinD service approach was crashing the job (exit 2) because the
Forgejo runner on this host does not honour the `options: --privileged`
field for service containers, so dockerd inside DinD could never start.
Root cause of the broader CI failure: dagger-stunnel.service stopped
cleanly (exit 0 → no auto-restart), leaving port 8774 without a
listener. A plain socat TCP proxy (8774→1774) is now running on the
host as a stop-gap until stunnel is restarted.
Changes:
- Remove the docker:27-dind service container from ci.yml entirely
- Simplify "Locate Docker daemon" step — warn instead of failing when
Docker is unavailable (job fails later at the Dagger step with a
clearer message)
- Add plain-TCP path to setup_dagger_remote.sh: after a successful nc
probe, try `dagger version` directly over the target host:port before
falling back to the TLS stunnel setup; this works with both the socat
plain-TCP proxy and any future plain-TCP Dagger engine exposure
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The act runner on Codeberg may not apply the services.env block to the
DinD container, so DOCKER_TLS_CERTDIR defaults to /certs and dockerd
starts with TLS on port 2376 instead of 2375. Fix by passing
--env DOCKER_TLS_CERTDIR= directly via options: so it is always applied
at docker run time.
Also:
- Try the host Docker socket (DooD) first before DinD; many self-hosted
runners mount /var/run/docker.sock and this is simpler and more reliable.
- Remove the workflow-level DOCKER_HOST override; let the step discover
and export the correct value instead of pre-forcing tcp://docker:2375.
- Retry DinD by hostname up to 60 s before falling back to scanning.
- Add DNS resolution check (getent hosts docker) and a port 2376 probe
that surfaces the TLS-still-enabled diagnostic message clearly.
- Improve final diagnostics (IPs, DNS, socket path) to aid future debugging.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous fallback only scanned .1-.50 of the first interface's
subnet, missing the DinD container when its IP is higher (.51+) or
when the forgejo-jobs network is on a different interface than
hostname -I returned first.
Now iterates all non-loopback IPs from hostname -I, scans each
subnet's full /24 (.1-254), and uses a 0.3 s bash /dev/tcp probe
instead of nc -zw1 to keep the total scan time under ~80 s.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The runner image does not have iproute2 installed, so `ip route` fails
with exit 127. Use `hostname -I` (available everywhere) to get the
container's own IP and derive the /24 prefix for the DinD port scan.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The docker:27-dind service container needs --privileged to start dockerd;
without it the container exits immediately and its DNS alias is removed,
causing the embedded DNS to return SERVFAIL for 'docker'.
Codeberg's act runner may also not register the service key as a network
alias at all. Add a 'Locate Docker daemon' step that tries the configured
DOCKER_HOST first, then falls back to scanning the local /24 for port 2375
so the local Dagger engine can connect to DinD regardless.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When the remote Dagger engine (stunnel/port 8774) is unreachable, Dagger
falls back to a local engine which requires a Docker daemon. The job container
does not have /var/run/docker.sock mounted, so the fallback was failing with
"connect: no such file or directory".
Add a docker:27-dind service to the CI job and set DOCKER_HOST=tcp://docker:2375
so Dagger can start a local engine when the remote engine is unavailable.
Also guard the Firebase and Play Store steps in deploy.yml so they are skipped
gracefully when the relevant secrets are not configured.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The remote Dagger engine probe exits with an error when the server is
down, failing CI before any tests run. Change the probe to exit 0 on
timeout and print a warning instead; with _DAGGER_RUNNER_HOST unset
Dagger will start a local engine and CI can still complete.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Dagger engine stopped responding (connection refused) after the
previous run exhausted disk space and crashed it. Two changes:
1. setup_dagger_remote.sh: retry the nc probe up to 5 times with 30 s
delays so a transient crash/restart window doesn't immediately fail
the job.
2. ci.yml: add a post-check prune step (if: always()) so the engine
cache is cleaned up after every run, reducing the chance of disk
exhaustion on the next run.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previous fix (retry × 3 with 60 s sleep) was not enough: all three
attempts still failed because the engine cache stayed full throughout.
Add an explicit `dagger query '{ engine { localCache { prune } } }'`
call (a) as a proactive step in ci.yml right after the stunnel setup,
and (b) inside the retry handler before each back-off sleep (now 90 s
instead of 60 s). The prune evicts stale execution-cache snapshots
(e.g. old pubspec.lock layers) so fresh disk is available when flutter
pub get runs. The `|| true` guard makes the prune non-fatal if the
query syntax changes between Dagger versions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Dagger engine occasionally runs out of disk during `flutter pub get`
when multiple CI jobs run in parallel. Space typically frees up within
~60 seconds as other containers finish. Add "No space left on device"
as a retryable condition with a 60 s back-off so PR runs survive the
transient shortage (run 4199480 was the trigger).
Both isCoreLibraryDesugaringEnabled = true in compileOptions and the
coreLibraryDesugaring("com.android.tools:desugar_jdk_libs:2.1.4")
dependency are already present in android/app/build.gradle.kts from
the earlier fix in #37. This commit closes issue #183 which was opened
to track the same requirement.
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.