## Summary
Closes#542.
- Bumped `ci/dagger.json` `engineVersion`, the Forgejo runner Dockerfile (`.forgejo/Dockerfile`), and the example `dagger-engine.service` unit in `DAGGER.md` from `0.20.8` -> `0.21.4` so they match the running engine and the CLI already pinned by `flake.nix`.
- Added `scripts/check_dagger_versions.sh` which parses the four pinned references and fails if any drift apart.
- Wired the lint into `Taskfile.yml` (`task check-dagger-versions`) and `.pre-commit-config.yaml` (triggered when any of the four pinned files change).
## Verification
- `./scripts/check_dagger_versions.sh` -> passes, all four references at `v0.21.4`.
- Temporarily edited `ci/dagger.json` to `v0.21.3` and re-ran the script: exits non-zero with a clear "out of sync" error.
Generated with Claude Code.
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/544
Closes#451
## What changed
Replaces the default Flutter blue logo with the project's rainbow-rings `icon.svg` on all supported platforms.
**Android** — all five mipmap densities regenerated (`mdpi` 48px through `xxxhdpi` 192px).
**Linux** — `linux/sharedinbox.png` (512×512) added, installed next to the binary via `CMakeLists.txt`, and set as the GTK window icon via `gtk_window_set_icon_from_file` in `my_application.cc`.
**Tooling** — `icon.png` (1024×1024 source raster) committed; `flutter_launcher_icons` added as dev dep with a `flutter_icons` config block; `task generate-icons` added to `Taskfile.yml` for future regeneration; `librsvg` added to `flake.nix` so `rsvg-convert` is available inside `nix develop`.
## How verified
Icons were generated with Inkscape from `icon.svg` and visually confirmed (rainbow-rings design appears correctly at all sizes). The `playstore/icon.png` was already correct and unchanged.
Co-authored-by: Thomas SharedInbox <sharedinbox@thomas-guettler.de>
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/459
## Summary
- Deletes `scripts/build_android_bundle_local.sh`, which required a host Android SDK and failed with `No Android SDK found`
- Removes the `build-android-bundle-local` Taskfile task that invoked it
- Rewrites `deploy-android-bundle` to call the existing Dagger `publish-android` pipeline (build → stamp versionCode → sign → upload) via `sops exec-env` for local secret injection — no local Android SDK needed
The `publish-android` Dagger function (`ci/main.go`) already handles everything the old script did (keystore decode, AAB build, signing) plus version-code stamping, so no changes to `ci/main.go` are required.
Closes#444🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Thomas SharedInbox <sharedinbox@thomas-guettler.de>
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/449
Closes#415
## Summary
- Adds missing `timeout-minutes` to `ci.yml` (`check` job, 60 min) and `windows-nightly.yml` (90 min, ready for when the Windows runner is registered)
- Wraps `ssh-keyscan` and `ssh -f -N -L` tunnel creation in `setup_dagger_remote.sh` with `timeout 30`; emits a `::warning::` annotation when either takes more than 10 s
- Adds `timeout --kill-after=10 <N>` to all bare `dagger call` invocations in `Taskfile.yml`: 600 s for test/query tasks, 1800 s for build/deploy tasks, 60 s for `ci-graph`; `stalwart` and `check-dagger` (already protected) left untouched
- Adds `timeout --kill-after=10 2400` per attempt in `run_firebase_test.sh`; emits `::warning::` on exit 124 instead of silently retrying
## Test plan
- CI passes on this PR (the `check` job now has `timeout-minutes: 60` and will self-enforce)
- All `dagger call` lines in `Taskfile.yml` now have a `timeout` prefix (visible in the diff)
- `setup_dagger_remote.sh` logic is unchanged — only the two network calls are wrapped
Co-authored-by: Thomas SharedInbox <sharedinbox@thomas-guettler.de>
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/432
Keystore is decoded into /dev/shm (tmpfs, RAM-only) during the build
and cleaned up on exit — never written to physical disk. ANDROID_KEYSTORE_PATH
is now required with no fallback; missing it fails loudly. Dagger CI path
updated to write to /tmp and set ANDROID_KEYSTORE_PATH accordingly.
Also fix check_ci_images.sh: filter out incomplete image tags ending in ':'
that arise from dynamic From("image:"+variable) concatenations.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
- Adds `scripts/check_ci_images.sh`: extracts every `From("...")` image reference from `ci/main.go` and runs `skopeo inspect --no-creds` on each one (manifest-only, no layer pull, no daemon required)
- Adds `task check-ci-images` task in `Taskfile.yml` that runs the script
- Adds `ci-image-exists` hook to `.pre-commit-config.yaml` that fires only when `ci/main.go` is staged (using `files: ^ci/main\.go$` rather than `always_run`, to avoid a network round-trip on every unrelated commit)
- Adds `skopeo` to the Nix devShell so the tool is on PATH when the hook runs via `nix develop --command`
This catches a bad image tag (like `ghcr.io/cirruslabs/flutter:3.44.1` not yet published) at commit time, before the push reaches CI.
## Test plan
- Stage a change to `ci/main.go` bumping a `From("...")` tag to a non-existent version → hook rejects commit with NOT FOUND
- Stage a change with valid image tags → hook prints OK for each image and allows the commit
- Stage a change to any other file → `ci-image-exists` hook is skipped entirely
Closes#407
Co-authored-by: Thomas SharedInbox <sharedinbox@thomas-guettler.de>
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/413
## Summary
- Wraps the \`dagger call\` in \`deploy-linux\`, \`publish-android\`, and \`deploy-apk\` Taskfile tasks with \`scripts/silent_on_success.sh\`
- On success: no Dagger output is printed (eliminates the verbose logs seen in deploy.yml CI runs)
- On failure: full Dagger output is replayed so errors remain visible
The project already uses \`scripts/silent_on_success.sh\` for other noisy commands (fvm, flutter pub get, build_runner, etc.) — this applies the same pattern to the three deploy tasks called from \`.forgejo/workflows/deploy.yml\`.
Closes#389
## Test plan
- [ ] Verify deploy CI run produces significantly less output on success
- [ ] Verify that on a failure, the full Dagger output is still printed
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Thomas SharedInbox <sharedinbox@thomas-guettler.de>
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/390
## Root cause
`BuildWebsite` and `PublishWebsite` in `ci/main.go` ran `hugo --minify` without setting the `HUGO_PARAMS_GITVERSION` environment variable. Hugo maps that env var to `site.Params.gitversion`, which the `website/layouts/_partials/extend_head.html` template uses to render `<meta name="x-version" content="...">` in the page `<head>`.
Without that meta tag, `website-verify.sh` (which greps for `x-version.*${VERSION}` in the live HTML) always timed out and reported failure — even though the site itself was deployed successfully.
## Fix
- Added an optional `commitHash` parameter to `BuildWebsite` and `PublishWebsite` in `ci/main.go`. When provided, it is passed to the Hugo container via `WithEnvVariable("HUGO_PARAMS_GITVERSION", commitHash)` — consistent with how `BuildLinuxRelease` and friends already inject `GIT_HASH`.
- Updated `task publish-website` in `Taskfile.yml` to compute `HASH=$(git rev-parse --short HEAD)` and forward it as `--commit-hash "$HASH"` — matching the pattern used by `task deploy-linux`.
## Verification
- `gofmt` passes on the modified `ci/main.go`.
- The logic mirrors the existing `BuildLinuxRelease` pattern that already works in CI.
Closes#360
Co-authored-by: Thomas SharedInbox <sharedinbox@thomas-guettler.de>
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/362
## Summary
Closes#349
Two bugs prevented `check-dagger` from failing fast when checks failed:
- **Hygiene + Layers checked sequentially** — they are cheap structural checks with no dependency on each other. Running them in parallel (`errgroup.Group`) means failures are reported sooner.
- **Spurious retries from `errgroup.WithContext`** — the backend and integration tests previously shared a derived context via `errgroup.WithContext`. When one test failed, the context was cancelled, causing the sibling test to emit `"context canceled"` in Dagger's `--progress=plain` output. The `retry_dagger` function in `Taskfile.yml` matched that string as a transient network error and re-ran the entire pipeline up to 3 times — a real test failure could take 30+ minutes to be reported instead of ~10.
**Fix in `ci/main.go`:**
- Hygiene + layers now run in parallel with `errgroup.Group`
- Backend + integration tests now use `errgroup.Group` (no shared cancel context), so a failure in one does not emit `"context canceled"` for the other
**Fix in `Taskfile.yml`:**
- Removed `context canceled` from the `retry_dagger` grep pattern; the remaining patterns (`connection reset`, `context deadline exceeded`, `connection refused`, `invalid return status code`) still cover genuine network/engine transients
## Test plan
- [ ] Confirm the Forgejo CI run completes and, when a check fails, it fails fast (no 3× retry loop in logs)
- [ ] Verify `task check-dagger` still retries on actual connection errors
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Thomas SharedInbox <sharedinbox@thomas-guettler.de>
Co-authored-by: guettli <guettli@noreply.codeberg.org>
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/350
Three unguarded blocking calls caused CI to hang until the 60-min timeout:
- dagger query prune steps had no timeout; || true only catches errors, not hangs
- docker info (added in d905cd6) had no timeout if Docker socket is unresponsive
- until portfile loop in check-dagger spun forever if otel-receiver.py crashed
Fixes: timeout 120 on all dagger query prune calls, timeout 30 on docker info,
and a kill -0 process-alive guard on the portfile until loop with fallback.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a Renovate() Dagger function using the forgejo platform and a
.forgejo/workflows/renovate.yml workflow triggered at 06:00 UTC daily.
Uses RENOVATE_FORGEJO_TOKEN secret; no dedicated Renovate service account needed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
env:SSH_PRIVATE_KEY passes the key through shell $() which strips the
trailing newline, causing dagger to write a truncated key that OpenSSH
rejects with "error in libcrypto". Using file: reads it directly from
disk, preserving exact content.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add scripts/run_firebase_test.sh that strips ANSI codes and removes
UP-TO-DATE task lines, libsqlite warnings, Gradle deprecation notices
and other high-volume noise before it hits the CI log.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements issue #132. Builds debug app APK + androidTest APK via Dagger,
then runs them on Firebase Test Lab using the FIREBASE_TEST_LAB_SERVICE_ACCOUNT_KEY
secret and FIREBASE_PROJECT_ID variable.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The GET /shutdown endpoint on otel-receiver.py is the one clean shutdown
path. cleanup() only needs to remove temp files.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rename ci/otelrecv.py to ci/otel-receiver.py for readability.
Replace SIGTERM+wait shutdown (which could hang indefinitely) with an
HTTP-based approach: add GET /shutdown to otel-receiver.py that calls
self.server.shutdown() directly. After dagger call returns, curl that
endpoint so the receiver prints its timing report and exits cleanly.
Cleanup is reduced to a SIGKILL fallback in case the process is already
gone.
Also fix the do_GET handler to reference self.server instead of the
local variable server, which was inaccessible from the handler class.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove per-request debug logs from otelrecv.py (POST, decoding,
decoded, 200 sent, signal) that were added to diagnose the CI hang,
which has since been resolved.
Remove verbose [HH:MM:SS] timestamp messages from check-dagger
(start, pipeline done, otelrecv started/ready, final RC, cleanup
start/done) for the same reason.
Fix cleanup to send SIGTERM + wait instead of SIGKILL so the OTEL
timing report is actually printed at the end of each CI run.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a Ci.Graph() Dagger function that emits a Mermaid flowchart showing
both the Dagger Check pipeline (toolchain → pubGetLayer → parallel steps)
and the Codeberg CI job dependencies (check → build-linux / deploy-playstore
→ publish-website).
Usage: dagger call -m ci --source=. graph
task ci-graph
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
wait "$RECV_PID" was blocking despite kill -9 (possibly because $RECV_PID
was garbled by ANSI escape codes from dagger output, making kill target the
wrong PID). Fix:
- Remove wait entirely — zombie is reaped when the shell exits
- Add pkill -9 -f otelrecv.py as fallback in case kill-by-PID misses
- Log PID at capture time to verify correctness in CI logs
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three changes:
- cleanup() now uses kill -9 instead of kill (SIGTERM) to prevent wait hanging
if otelrecv's signal handler stalls
- adds [HH:MM:SS] log lines at key points so CI logs show exactly where time is spent
- restores OTEL env vars (via env VAR=val) since they were confirmed not to cause the hang
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dagger ignores SIGTERM, keeping the pipe's write end open; tee can never
get EOF and the script hangs. --kill-after=10 follows up with SIGKILL which
closes the pipe and unblocks the script.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On network errors (connection reset, context canceled, connection refused)
retry the dagger call rather than failing immediately. Real test failures
propagate without retry.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dagger call hangs after function completion due to HTTP/2 teardown bug in
remote engine mode. Capture output via tee; if timeout fires but output
contains "All tests passed", exit 0 instead of 124.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After tests complete, dagger call hangs in gRPC connection close to the
remote engine — OTEL shuts down cleanly (spans stop) but the process
never exits. Wrapping with timeout 900s and treating exit 124 as success
unblocks CI and lets the OTEL timing report print.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
http/json is not supported by the Go OTEL SDK used in Dagger v0.20.8.
Switch to http/protobuf (the SDK default) and rewrite the Python receiver
to decode binary protobuf using stdlib struct — no pip required.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Dagger v0.20.8 only supports 'grpc' and 'http/protobuf' OTLP protocols;
'http/json' triggers a WARN and exports nothing. The new approach pipes
dagger's --progress=plain output through a Python script that echoes it
in real-time and prints a timing table at EOF. No HTTP server, no port
files, no protocol issues — works locally and in CI.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
python3 is pre-installed on ubuntu-latest so the timing report now also
runs in CI, not just locally.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
TIMINGFILE=$(mktemp) was an unnecessary /tmp path. The receiver already
prints its report to stdout on shutdown; wait $RECV_PID captures it in
place. Only PORTFILE remains in /tmp (unique via mktemp, deleted in cleanup).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds ci/otelrecv/main.go — a minimal OTLP HTTP/JSON trace receiver that
listens on a random port (port 0) so parallel runs never collide.
The check-dagger Taskfile task now starts the receiver in the background,
passes the port via a mktemp file, runs dagger with OTEL env vars set,
then prints a per-span timing report on shutdown. Falls back to plain
dagger call when Go is not available (e.g. CI containers without Go).
First run will show raw attribute keys so we can learn Dagger's exact
telemetry format and refine the cached/live detection logic.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
BuildAndroidRelease() drops all params and builds with --build-number 1
(no keystore injected, Gradle uses debug signing). The command is now
stable across all commits — full Dagger cache hit whenever source is
unchanged.
Three new Dagger functions handle the post-cache steps:
- StampAndroidVersionCode(aab, versionCode): pure-stdlib Python patches
the AAB's compiled manifest proto (android:versionCode resource ID
0x0101021b) and strips META-INF/ to clear the old signature.
- SignAndroidBundle(aab, keystoreBase64, keystorePassword): decodes the
base64 keystore secret and re-signs with jarsigner.
- PublishAndroid(ctx, playStoreConfig, keystoreBase64, keystorePassword):
chains all three + UploadToPlayStore, computing time.Now().Unix() as
the versionCode internally.
Taskfile: build-android-bundle simplified (no keystore params); publish-
android now calls publish-android in a single Dagger call instead of the
two-step build-then-upload.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
$(date +%s) changed every run, making the flutter build WithExec args
unique each time and busting the Dagger layer cache (500s build every run).
$(git log -1 --format=%ct HEAD) is stable for the same commit, so a
retry of a failed upload gets a full cache hit on the build step.
Still monotonically increasing across commits, satisfying Play Store.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The old fvm-based task had the same name as the new Dagger-based one,
causing go-task to error immediately (1-second CI failure).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>