Commit Graph
672 Commits
Author SHA1 Message Date
Thomas SharedInboxandClaude Sonnet 4.6 7e155f5785 fix(ci): kill dagger via timeout when it hangs in gRPC teardown
After tests complete, dagger call hangs in gRPC connection close to the
remote engine — OTEL shuts down cleanly (spans stop) but the process
never exits. Wrapping with timeout 900s and treating exit 124 as success
unblocks CI and lets the OTEL timing report print.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 12:36:13 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 95d114cc38 debug(otelrecv): add stderr logging to diagnose CI hang
Log each POST request, decode step, 200 response, signal receipt, and
server shutdown to understand where the hang occurs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 12:22:04 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 d5e3974d94 fix(otelrecv): send explicit Content-Length + Connection: close
Without Content-Length the Go HTTP/1.1 client can't tell the response
body is empty, causing dagger call to hang waiting for more data.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 12:07:57 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 1c27dc4f71 fix(ci): use http/protobuf OTEL protocol with binary protobuf receiver
http/json is not supported by the Go OTEL SDK used in Dagger v0.20.8.
Switch to http/protobuf (the SDK default) and rewrite the Python receiver
to decode binary protobuf using stdlib struct — no pip required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 11:46:58 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 691f2beec2 fix(ci): switch timing from OTEL receiver to --progress=plain pipe filter
Dagger v0.20.8 only supports 'grpc' and 'http/protobuf' OTLP protocols;
'http/json' triggers a WARN and exports nothing.  The new approach pipes
dagger's --progress=plain output through a Python script that echoes it
in real-time and prints a timing table at EOF.  No HTTP server, no port
files, no protocol issues — works locally and in CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 11:43:26 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 ac2178916e refactor(ci): replace Go OTEL receiver with Python (stdlib, no deps)
python3 is pre-installed on ubuntu-latest so the timing report now also
runs in CI, not just locally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 11:30:08 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 b10696a41e fix(ci): remove tmp timing file — receiver writes directly to stdout
TIMINGFILE=$(mktemp) was an unnecessary /tmp path. The receiver already
prints its report to stdout on shutdown; wait $RECV_PID captures it in
place. Only PORTFILE remains in /tmp (unique via mktemp, deleted in cleanup).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 10:38:26 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 3471e1fd2c feat(ci): OTEL timing receiver for check-dagger
Adds ci/otelrecv/main.go — a minimal OTLP HTTP/JSON trace receiver that
listens on a random port (port 0) so parallel runs never collide.

The check-dagger Taskfile task now starts the receiver in the background,
passes the port via a mktemp file, runs dagger with OTEL env vars set,
then prints a per-span timing report on shutdown. Falls back to plain
dagger call when Go is not available (e.g. CI containers without Go).

First run will show raw attribute keys so we can learn Dagger's exact
telemetry format and refine the cached/live detection logic.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 10:27:57 +02:00
Thomas SharedInbox f23328fd1f ci: empty commit to verify cache stability 2026-05-20 09:39:47 +02:00
Thomas SharedInbox 41ac45a92e ci: empty commit to retry after stunnel fix 2026-05-20 09:18:25 +02:00
Thomas SharedInbox c090a320f6 ci: empty commit to verify cache after disk cleanup + parallelization 2026-05-20 09:07:57 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 2748517d1c feat(ci): run TestBackend and TestIntegration in parallel
Saves ~1 minute on every CI run by starting the integration test build
concurrently with the backend Stalwart tests instead of waiting for them
to finish first.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 08:58:55 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 2fd82eadc4 ci: empty commit to verify dart analyze + unit test caching
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 00:09:39 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 e0a99d7d63 fix(ci): remove --no-pub from integration tests; use dart analyze instead of flutter analyze
Integration tests build native Linux app via CMake which requires pub get side effects
(plugin registrant file generation) — --no-pub broke the CMake step.

Switch flutter analyze to dart analyze --fatal-infos to eliminate the flutter wrapper's
non-deterministic state writes to /root/.dartServer/, which were preventing action cache
hits on the analyze step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 23:57:25 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 7fce683375 fix(ci): add --no-pub to flutter analyze and flutter test execs
Without --no-pub, flutter re-runs pub get internally before each
analyze/test call, writing a fresh package_config.json with new
timestamps. This makes the exec output snapshot non-deterministic
and prevents BuildKit from caching the result across CI runs.

With --no-pub, flutter uses the package_config.json already produced
by pubGetLayer(), and the exec output is stable → persistent cache hits.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 23:30:47 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 d369c26470 ci: empty commit to verify cache propagation to dart format and analyze
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 23:19:19 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 ef4bcd1eeb ci: empty commit to verify cache stability after dart-tool-build mount removal
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 23:03:26 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 39204fabcd fix(ci): remove dart-tool-build cache mount from setup()
Shared mutable cache mounts prevent BuildKit from persistently caching
the exec result across sessions. Without the mount, build_runner output
is stored in the content-addressed snapshot and survives GC cycles,
allowing downstream analyze/test steps to also be stably cached.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 22:46:39 +02:00
Thomas SharedInbox b2a15aee09 ci: empty commit 2026-05-19 22:25:31 +02:00
Thomas SharedInbox 8f66047a2e ci: empty commit — run 369, verify full caching under 50GB reservedSpace 2026-05-19 22:23:54 +02:00
Thomas SharedInbox 4eb06d487a ci: empty commit — verify snapshot retention under 50GB reservedSpace 2026-05-19 22:11:26 +02:00
Thomas SharedInbox 07d90f7d50 ci: empty commit to verify pub-get exec-cache survival after run-365 crash 2026-05-19 19:11:55 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 b090342637 fix(ci): revert bad /root/.flutter cache mount — it is a file, not a directory
WithMountedCache requires a directory. /root/.flutter in the cirruslabs/flutter
image is a plain text file (Flutter SDK marker), causing "not a directory" at
container startup. Reverts to the pre-365 Base() so run-364 exec cache entries
are still valid.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 19:11:40 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 6dcefa856e fix(ci): mount /root/.flutter as cache volume to keep pub-get snapshot small
Flutter writes tool state to /root/.flutter on every invocation. Without a
cache mount this ends up in the pub-get snapshot, making it large and prone
to GC eviction. Moving it to a cache volume keeps the snapshot tiny so
BuildKit's exec cache for pub get survives between CI runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 19:00:45 +02:00
Thomas SharedInbox 1393a31d61 ci: empty commit to verify pub get fully cached after date_created fix 2026-05-19 18:45:40 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 59dfb0cfb3 fix(ci): also strip date_created from .flutter-plugins-dependencies
flutter pub get writes a date_created timestamp into .flutter-plugins-
dependencies in addition to the generated field in package_config.json.
Both files are part of the pub-get execution snapshot, so both timestamps
must be removed to make the layer deterministic and cacheable.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 18:33:20 +02:00
Thomas SharedInbox df0c3910cb ci: empty commit to verify pub get determinism caching 2026-05-19 17:59:50 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 4daf47f7a3 fix(ci): make pub get layer deterministic to enable test caching
Remove non-deterministic "generated" and "generatorVersion" fields from
.dart_tool/package_config.json after flutter pub get, so the snapshot
hash is stable across runs and all downstream test steps can be cached.
Mount only .dart_tool/build as a mutable cache volume so the incremental
build graph persists without polluting the deterministic snapshot.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 17:39:20 +02:00
Thomas SharedInbox dd9bd24f09 perf(ci): cache pub get separately from source to fix downstream cache misses
flutter pub get embeds a timestamp in .dart_tool/package_config.json, making
its output snapshot non-deterministic and busting the cache for dart format,
flutter analyze, unit tests, mocks, and integration tests on every run.

Fix: isolate pub get into its own layer using only pubspec.yaml + pubspec.lock
as inputs, then normalise the generated timestamp. setup() now overlays the
full source on top of this stable layer before running build_runner.

Result: on an empty commit, all steps downstream of pub get should be cached.
2026-05-19 16:59:19 +02:00
Thomas SharedInbox 1e0679c324 ci: second empty commit — sdkmanager should be cached now 2026-05-19 16:06:21 +02:00
Thomas SharedInbox d826522072 ci: empty commit to verify sdkmanager cache after GC policy fix 2026-05-19 15:55:18 +02:00
Thomas SharedInbox ec60566a33 ci: empty commit to verify Dagger cache after disk pressure resolved 2026-05-19 14:21:16 +02:00
Thomas SharedInbox 00625e318a ci: empty commit to verify SDK pre-install caching 2026-05-19 11:57:49 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 354f7959f6 fix(ci): pre-install Android SDK components in container layer
Cache volumes for NDK/CMake proved unreliable on the remote Dagger
engine: the android-ndk-cache volume was empty on each run, causing
Gradle to re-download NDK + CMake + build-tools + platform during every
`flutter build appbundle` (~3-4 min of extra downloads).

Pre-install all four SDK components via sdkmanager in Base() so Dagger's
execution cache captures them. Base() is CACHED on subsequent runs with
identical inputs, eliminating the per-run SDK downloads.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 11:35:44 +02:00
Thomas SharedInbox 75a7b947cd ci: retry after transient ghcr.io 502 2026-05-19 11:11:54 +02:00
Thomas SharedInbox fda0210bd0 ci: empty commit to verify source-scoped Dagger cache hits 2026-05-19 11:09:15 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 9e709873b9 refactor(ci): scope source inputs per pipeline — android/linux builds no longer bust on unrelated changes
Base() no longer mounts m.Source. Each function gets only the files it
needs via a narrow filter, so Dagger's content-addressed cache is scoped
correctly: changing website/, scripts/, or stalwart-dev/ no longer
invalidates the Android or Linux build cache.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 10:52:57 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 45c3a8088b ci: empty commit to verify Dagger cache hits
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 18:44:30 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 f60beaa199 fix: XmlNode.element is at proto field 1, not 2 — versionCode patch was silently skipping all elements
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 17:49:10 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 d9cde7cacf debug: dump manifest proto structure when versionCode not found
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 17:23:44 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 bb163542bb fix: add versionCode read-back verification and handle fixed-width wire types
- _parse now handles wire types 1 (fixed64) and 5 (fixed32) so it doesn't
  crash on unknown fields in the manifest proto
- _patch_prim patches both int_decimal_value (field 6) and int_hexadecimal_value
  (field 7) — AAPT2 may use either
- patch() reads versionCode before and after patching and exits with a clear
  error if the patch didn't take effect

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 17:11:20 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 8319002e0c fix: use publish-android task for Play Store deploy (stamps + signs + uploads)
The old workflow built with build-android-bundle (debug-signed) then uploaded
separately. publish-android stamps the versionCode, re-signs with the release
keystore, and uploads in one Dagger call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 16:42:23 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 02e8c2200a fix: fail fast with clear error when keystore secrets are empty
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 14:17:41 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 8518715bcf ci: retrigger to verify Dagger cache hits
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 13:54:23 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 2d559d4947 feat: cache Android AAB build; stamp versionCode + resign after cache hit
BuildAndroidRelease() drops all params and builds with --build-number 1
(no keystore injected, Gradle uses debug signing). The command is now
stable across all commits — full Dagger cache hit whenever source is
unchanged.

Three new Dagger functions handle the post-cache steps:
- StampAndroidVersionCode(aab, versionCode): pure-stdlib Python patches
  the AAB's compiled manifest proto (android:versionCode resource ID
  0x0101021b) and strips META-INF/ to clear the old signature.
- SignAndroidBundle(aab, keystoreBase64, keystorePassword): decodes the
  base64 keystore secret and re-signs with jarsigner.
- PublishAndroid(ctx, playStoreConfig, keystoreBase64, keystorePassword):
  chains all three + UploadToPlayStore, computing time.Now().Unix() as
  the versionCode internally.

Taskfile: build-android-bundle simplified (no keystore params); publish-
android now calls publish-android in a single Dagger call instead of the
two-step build-then-upload.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 13:35:20 +02:00
Thomas GüttlerandClaude Sonnet 4.6 f6bb6aed82 ci: empty commit to verify Dagger cache
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 11:34:21 +02:00
Thomas GüttlerandClaude Sonnet 4.6 007e7b57f1 fix: revert CacheSharingModeLocked to fix deadlock in Check()
Locked exclusive cache access caused concurrent Dagger operations inside
Check() to deadlock waiting on each other, resulting in a 60-minute timeout.
Shared mode is correct here — cache volumes are pre-warmed so pub get is fast.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 11:19:41 +02:00
Thomas GüttlerandClaude Sonnet 4.6 0ea06e8634 fix: use CacheSharingModeLocked instead of dagger.Locked
dagger.Locked is not exported in this SDK version; the correct
constant is dagger.CacheSharingModeLocked.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 10:16:02 +02:00
Thomas GüttlerandClaude Sonnet 4.6 592efae934 perf: lock cache volumes and add --no-pub to fix Dagger cache misses
flutter pub get was not being cached by Dagger because the pub-cache
CacheVolume used Shared mode: concurrent writes from the check and
deploy-playstore jobs made the mount non-deterministic, causing a cache
miss on every run. Locked mode gives each operation exclusive access so
the output snapshot is stable and Dagger can cache subsequent steps.

Also add --no-pub to both flutter build commands: pub get already ran
explicitly in Setup(), so skipping it again inside the build step avoids
a duplicate network-touching operation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 09:45:39 +02:00
Thomas GüttlerandClaude Sonnet 4.6 9466f03936 perf: use commit timestamp as build number to enable Dagger cache hits
$(date +%s) changed every run, making the flutter build WithExec args
unique each time and busting the Dagger layer cache (500s build every run).

$(git log -1 --format=%ct HEAD) is stable for the same commit, so a
retry of a failed upload gets a full cache hit on the build step.
Still monotonically increasing across commits, satisfying Play Store.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 09:20:05 +02:00