Commit Graph
494 Commits
Author SHA1 Message Date
Thomas SharedInboxandClaude Sonnet 4.6 041e496e58 fix(ci): rename otelrecv→otel-receiver, fix teardown hang
Rename ci/otelrecv.py to ci/otel-receiver.py for readability.

Replace SIGTERM+wait shutdown (which could hang indefinitely) with an
HTTP-based approach: add GET /shutdown to otel-receiver.py that calls
self.server.shutdown() directly. After dagger call returns, curl that
endpoint so the receiver prints its timing report and exits cleanly.
Cleanup is reduced to a SIGKILL fallback in case the process is already
gone.

Also fix the do_GET handler to reference self.server instead of the
local variable server, which was inaccessible from the handler class.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 15:18:34 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 f2d24a8514 fix(ci): reduce noise in CI output (#128)
- Filter flutter pub get package-listing lines (^[+~><] ) in pubGetLayer
- Filter build_runner compilation-progress lines (^\[) in setup() and CheckMocks()
- Add -q to git commit in CheckMocks to suppress "460 files changed" stats
- Wrap flutter test in Coverage, TestBackend, TestIntegration, TestSyncReliability
  to show only the summary line on success and full output on failure
- Apply same build_runner filter to scripts/check_mocks_fresh.sh for local runs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 14:51:56 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 9dc34cefe5 ci: add 30-minute Dagger-side timeout to Check pipeline
If any step hangs (stuck service, deadlocked test, network stall), the
pipeline will now cancel itself after 30 min rather than blocking the
runner indefinitely.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 11:53:49 +02:00
Thomas SharedInbox f315c21c9a add "list" sub-command to agent-loop to resume via UUID. 2026-05-21 11:49:32 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 541c1a0b53 fix(ci): reduce noise in CI output (#128)
Remove per-request debug logs from otelrecv.py (POST, decoding,
decoded, 200 sent, signal) that were added to diagnose the CI hang,
which has since been resolved.

Remove verbose [HH:MM:SS] timestamp messages from check-dagger
(start, pipeline done, otelrecv started/ready, final RC, cleanup
start/done) for the same reason.

Fix cleanup to send SIGTERM + wait instead of SIGKILL so the OTEL
timing report is actually printed at the end of each CI run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 10:45:40 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 34fb51d85d feat(ci): add Graph() to visualize CI pipeline as Mermaid diagram (#126)
Adds a Ci.Graph() Dagger function that emits a Mermaid flowchart showing
both the Dagger Check pipeline (toolchain → pubGetLayer → parallel steps)
and the Codeberg CI job dependencies (check → build-linux / deploy-playstore
→ publish-website).

Usage: dagger call -m ci --source=. graph
       task ci-graph

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 10:28:28 +02:00
Thomas SharedInbox 58f1a4da42 feat(website): vendor PaperMod theme, remove git submodule (#125)
Replace the git submodule with directly tracked files so that
`git commit .` no longer fails with 'does not have a commit
checked out'. Removed .github/ from the vendored copy since
upstream CI workflows are not needed here.
2026-05-21 10:21:09 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 07823373c1 fix(ci): add withGoCache helper and pip cache for UploadToPlayStore
Adds withGoCache() that mounts GOCACHE and GOMODCACHE as Dagger cache
volumes — the standard pattern for any Go container added to the pipeline.
Also adds pip cache to UploadToPlayStore so pip wheel downloads are reused
between Play Store deploys.

Closes #123

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 06:41:04 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 1af2a36af7 fix(ci): remove pub cache volume from pubGetLayer for stable execution cache
flutter pub get was re-running on every CI run because Base() attached a
mutable WithMountedCache volume to /root/.pub-cache, making the execution
cache key unstable. Extract toolchain() without cache mounts; pubGetLayer()
now uses toolchain() so Dagger execution-caches pub get between runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 06:35:14 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 0cb181b138 fix(ci): remove wait from otelrecv cleanup; add pkill by name as fallback
wait "$RECV_PID" was blocking despite kill -9 (possibly because $RECV_PID
was garbled by ANSI escape codes from dagger output, making kill target the
wrong PID). Fix:
- Remove wait entirely — zombie is reaped when the shell exits
- Add pkill -9 -f otelrecv.py as fallback in case kill-by-PID misses
- Log PID at capture time to verify correctness in CI logs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 21:09:24 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 320fbcabc3 fix(ci): kill otelrecv with SIGKILL in cleanup, add timing logs, re-enable OTEL
Three changes:
- cleanup() now uses kill -9 instead of kill (SIGTERM) to prevent wait hanging
  if otelrecv's signal handler stalls
- adds [HH:MM:SS] log lines at key points so CI logs show exactly where time is spent
- restores OTEL env vars (via env VAR=val) since they were confirmed not to cause the hang

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 21:00:20 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 f187012f58 debug(ci): temporarily disable OTEL env vars to test if they cause dagger hang
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 20:31:48 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 d4b265724e fix(otelrecv): set close_connection=True so server actually closes after response
Sending Connection: close in the header without closing the server-side
socket left both dagger's Go HTTP client and Python's HTTPServer waiting
for the other to send FIN first. This blocked dagger's OTLP exporter
shutdown, which in turn blocked dagger from exiting.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 20:14:27 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 36b54a08d6 fix(ci): use --kill-after=10 on timeout so dagger is SIGKILLed if it ignores SIGTERM
dagger ignores SIGTERM, keeping the pipe's write end open; tee can never
get EOF and the script hangs. --kill-after=10 follows up with SIGKILL which
closes the pipe and unblocks the script.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 20:02:44 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 4a99d47aa5 fix(ci): add TCP keepalive to stunnel to prevent NAT connection resets
Connection drops consistently at ~50s suggest NAT/firewall idle timeout.
Keepalive probes every 10s on the remote side prevent the RST.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 19:43:16 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 c3737fb47f fix(ci): retry dagger call on TCP connection failures (up to 3 attempts)
On network errors (connection reset, context canceled, connection refused)
retry the dagger call rather than failing immediately. Real test failures
propagate without retry.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 18:47:38 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 88e8a9ab5c fix(ci): add 10-minute timeout to dagger call; treat teardown hang as success
dagger call hangs after function completion due to HTTP/2 teardown bug in
remote engine mode. Capture output via tee; if timeout fires but output
contains "All tests passed", exit 0 instead of 124.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 16:38:33 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 a078122d28 refactor(ci): replace dual DAGGER_STUNNEL_URL1/2 with single DAGGER_STUNNEL_URL
The engine is stable; no fallback needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 15:48:38 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 e60459ea2e fix(ci): add .task/ and .fvm/ to .daggerignore to skip walk
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 13:52:19 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 92cc725913 refactor: simplify .daggerignore and fix hardcoded path after repo move to sharedinbox/
.daggerignore no longer needs to exclude $HOME dirs (fvm/, go/, .pub-cache/,
.claude/, snap/, etc.) since the project root is now sharedinbox/, not $HOME.
agent_loop.py: replace hardcoded /home/si with Path.home().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 13:43:29 +02:00
Thomas SharedInbox bd03484fcc Revert "fix(ci): kill dagger via timeout when it hangs in gRPC teardown"
This reverts commit 7e155f5785.
2026-05-20 13:11:07 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 242e1ce4a4 fix(ci): exclude fvm/ and other large dirs from Dagger source sync
The source sync (Directory.Sync in selectFunc) was uploading ~7.4 GB /
78k files to the remote engine, blocking dagger call for 16+ minutes.

Root cause: .daggerignore had '.fvm/' but the actual directory is 'fvm/'
(no leading dot), so the 1.9 GB Flutter SDK cache was always uploaded.
Also missing: go/ pkg cache (309 MB), .claude/ session files, agent logs.

goroutine dump confirmed the hang in directoryValue.Get → Directory.Sync
→ HTTP/2 roundTrip waiting on the engine — not gRPC teardown as suspected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 13:04:53 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 7e155f5785 fix(ci): kill dagger via timeout when it hangs in gRPC teardown
After tests complete, dagger call hangs in gRPC connection close to the
remote engine — OTEL shuts down cleanly (spans stop) but the process
never exits. Wrapping with timeout 900s and treating exit 124 as success
unblocks CI and lets the OTEL timing report print.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 12:36:13 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 95d114cc38 debug(otelrecv): add stderr logging to diagnose CI hang
Log each POST request, decode step, 200 response, signal receipt, and
server shutdown to understand where the hang occurs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 12:22:04 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 d5e3974d94 fix(otelrecv): send explicit Content-Length + Connection: close
Without Content-Length the Go HTTP/1.1 client can't tell the response
body is empty, causing dagger call to hang waiting for more data.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 12:07:57 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 1c27dc4f71 fix(ci): use http/protobuf OTEL protocol with binary protobuf receiver
http/json is not supported by the Go OTEL SDK used in Dagger v0.20.8.
Switch to http/protobuf (the SDK default) and rewrite the Python receiver
to decode binary protobuf using stdlib struct — no pip required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 11:46:58 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 691f2beec2 fix(ci): switch timing from OTEL receiver to --progress=plain pipe filter
Dagger v0.20.8 only supports 'grpc' and 'http/protobuf' OTLP protocols;
'http/json' triggers a WARN and exports nothing.  The new approach pipes
dagger's --progress=plain output through a Python script that echoes it
in real-time and prints a timing table at EOF.  No HTTP server, no port
files, no protocol issues — works locally and in CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 11:43:26 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 ac2178916e refactor(ci): replace Go OTEL receiver with Python (stdlib, no deps)
python3 is pre-installed on ubuntu-latest so the timing report now also
runs in CI, not just locally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 11:30:08 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 b10696a41e fix(ci): remove tmp timing file — receiver writes directly to stdout
TIMINGFILE=$(mktemp) was an unnecessary /tmp path. The receiver already
prints its report to stdout on shutdown; wait $RECV_PID captures it in
place. Only PORTFILE remains in /tmp (unique via mktemp, deleted in cleanup).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 10:38:26 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 3471e1fd2c feat(ci): OTEL timing receiver for check-dagger
Adds ci/otelrecv/main.go — a minimal OTLP HTTP/JSON trace receiver that
listens on a random port (port 0) so parallel runs never collide.

The check-dagger Taskfile task now starts the receiver in the background,
passes the port via a mktemp file, runs dagger with OTEL env vars set,
then prints a per-span timing report on shutdown. Falls back to plain
dagger call when Go is not available (e.g. CI containers without Go).

First run will show raw attribute keys so we can learn Dagger's exact
telemetry format and refine the cached/live detection logic.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 10:27:57 +02:00
Thomas SharedInbox f23328fd1f ci: empty commit to verify cache stability 2026-05-20 09:39:47 +02:00
Thomas SharedInbox 41ac45a92e ci: empty commit to retry after stunnel fix 2026-05-20 09:18:25 +02:00
Thomas SharedInbox c090a320f6 ci: empty commit to verify cache after disk cleanup + parallelization 2026-05-20 09:07:57 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 2748517d1c feat(ci): run TestBackend and TestIntegration in parallel
Saves ~1 minute on every CI run by starting the integration test build
concurrently with the backend Stalwart tests instead of waiting for them
to finish first.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 08:58:55 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 2fd82eadc4 ci: empty commit to verify dart analyze + unit test caching
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 00:09:39 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 e0a99d7d63 fix(ci): remove --no-pub from integration tests; use dart analyze instead of flutter analyze
Integration tests build native Linux app via CMake which requires pub get side effects
(plugin registrant file generation) — --no-pub broke the CMake step.

Switch flutter analyze to dart analyze --fatal-infos to eliminate the flutter wrapper's
non-deterministic state writes to /root/.dartServer/, which were preventing action cache
hits on the analyze step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 23:57:25 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 7fce683375 fix(ci): add --no-pub to flutter analyze and flutter test execs
Without --no-pub, flutter re-runs pub get internally before each
analyze/test call, writing a fresh package_config.json with new
timestamps. This makes the exec output snapshot non-deterministic
and prevents BuildKit from caching the result across CI runs.

With --no-pub, flutter uses the package_config.json already produced
by pubGetLayer(), and the exec output is stable → persistent cache hits.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 23:30:47 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 d369c26470 ci: empty commit to verify cache propagation to dart format and analyze
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 23:19:19 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 ef4bcd1eeb ci: empty commit to verify cache stability after dart-tool-build mount removal
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 23:03:26 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 39204fabcd fix(ci): remove dart-tool-build cache mount from setup()
Shared mutable cache mounts prevent BuildKit from persistently caching
the exec result across sessions. Without the mount, build_runner output
is stored in the content-addressed snapshot and survives GC cycles,
allowing downstream analyze/test steps to also be stably cached.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 22:46:39 +02:00
Thomas SharedInbox b2a15aee09 ci: empty commit 2026-05-19 22:25:31 +02:00
Thomas SharedInbox 8f66047a2e ci: empty commit — run 369, verify full caching under 50GB reservedSpace 2026-05-19 22:23:54 +02:00
Thomas SharedInbox 4eb06d487a ci: empty commit — verify snapshot retention under 50GB reservedSpace 2026-05-19 22:11:26 +02:00
Thomas SharedInbox 07d90f7d50 ci: empty commit to verify pub-get exec-cache survival after run-365 crash 2026-05-19 19:11:55 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 b090342637 fix(ci): revert bad /root/.flutter cache mount — it is a file, not a directory
WithMountedCache requires a directory. /root/.flutter in the cirruslabs/flutter
image is a plain text file (Flutter SDK marker), causing "not a directory" at
container startup. Reverts to the pre-365 Base() so run-364 exec cache entries
are still valid.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 19:11:40 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 6dcefa856e fix(ci): mount /root/.flutter as cache volume to keep pub-get snapshot small
Flutter writes tool state to /root/.flutter on every invocation. Without a
cache mount this ends up in the pub-get snapshot, making it large and prone
to GC eviction. Moving it to a cache volume keeps the snapshot tiny so
BuildKit's exec cache for pub get survives between CI runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 19:00:45 +02:00
Thomas SharedInbox 1393a31d61 ci: empty commit to verify pub get fully cached after date_created fix 2026-05-19 18:45:40 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 59dfb0cfb3 fix(ci): also strip date_created from .flutter-plugins-dependencies
flutter pub get writes a date_created timestamp into .flutter-plugins-
dependencies in addition to the generated field in package_config.json.
Both files are part of the pub-get execution snapshot, so both timestamps
must be removed to make the layer deterministic and cacheable.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 18:33:20 +02:00
Thomas SharedInbox df0c3910cb ci: empty commit to verify pub get determinism caching 2026-05-19 17:59:50 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 4daf47f7a3 fix(ci): make pub get layer deterministic to enable test caching
Remove non-deterministic "generated" and "generatorVersion" fields from
.dart_tool/package_config.json after flutter pub get, so the snapshot
hash is stable across runs and all downstream test steps can be cached.
Mount only .dart_tool/build as a mutable cache volume so the incremental
build graph persists without polluting the deterministic snapshot.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 17:39:20 +02:00