## Summary
- The CI workflow used `on: [push, pull_request]`, which fires **two** runs whenever a commit is pushed to a branch with an open PR — one for the `push` event and one for the `pull_request` event.
- Scoped the `push` trigger to `branches: [main]` only. Feature-branch pushes now trigger only via `pull_request`; direct pushes to `main` (merge commits) still trigger via `push`.
## Test plan
- [ ] Open a PR and push a new commit — verify only one CI run appears, not two
- [ ] Merge a PR to `main` — verify CI still runs via the `push` trigger
Closes#483
Co-authored-by: Thomas SharedInbox <sharedinbox@thomas-guettler.de>
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/490
Closes#415
## Summary
- Adds missing `timeout-minutes` to `ci.yml` (`check` job, 60 min) and `windows-nightly.yml` (90 min, ready for when the Windows runner is registered)
- Wraps `ssh-keyscan` and `ssh -f -N -L` tunnel creation in `setup_dagger_remote.sh` with `timeout 30`; emits a `::warning::` annotation when either takes more than 10 s
- Adds `timeout --kill-after=10 <N>` to all bare `dagger call` invocations in `Taskfile.yml`: 600 s for test/query tasks, 1800 s for build/deploy tasks, 60 s for `ci-graph`; `stalwart` and `check-dagger` (already protected) left untouched
- Adds `timeout --kill-after=10 2400` per attempt in `run_firebase_test.sh`; emits `::warning::` on exit 124 instead of silently retrying
## Test plan
- CI passes on this PR (the `check` job now has `timeout-minutes: 60` and will self-enforce)
- All `dagger call` lines in `Taskfile.yml` now have a `timeout` prefix (visible in the diff)
- `setup_dagger_remote.sh` logic is unchanged — only the two network calls are wrapped
Co-authored-by: Thomas SharedInbox <sharedinbox@thomas-guettler.de>
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/432
## Summary
Fixes three distinct failures from CI deploy run #1424 and concurrent website update failures.
- **Play Store job**: `pip install google-auth requests` fails on Ubuntu 24.04 with PEP 668. Fixed by using `python3 -m venv` for an isolated install.
- **SSH key error (APK, Linux, website jobs)**: All SSH/rsync steps fail with `Load key "/root/.ssh/id_ed25519": error in libcrypto` inside the Dagger Alpine 3.21 container. This is the first time these jobs actually ran (all previous deploy runs had every job skipped). Two fixes:
- `setup_dagger_remote.sh`: `export_secret` was appending an extra trailing newline to values (like SSH private keys) that already end with `\n`. Now only adds one when needed.
- `ci/main.go` `Deployer`: mounts the key at a `.raw` path, strips Windows-style CRLF endings with `tr -d '\r'`, then writes the normalised key to `id_ed25519`. CRLF bytes cause "error in libcrypto" in Alpine's LibreSSL-backed openssh.
## Test plan
- [ ] Deploy run triggers after merge; all three deploy jobs complete
- [ ] Play Store verification step passes
- [ ] SSH commands in Alpine load the key without `error in libcrypto`
Closes#366
Co-authored-by: Thomas SharedInbox <sharedinbox@thomas-guettler.de>
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/369
Closes#361
Three bugs in the hourly deploy workflow's change-detection logic caused the Play Store to silently fall behind whenever a deploy failed or all-android jobs were skipped.
**Bug 1 (primary): commit_sha → head_sha**
Forgejo's API returns head_sha; commit_sha was always None. This meant LAST_DEPLOYED_SHA was always empty, so the diff fell back to HEAD~1..HEAD — only the single most recent commit was inspected. If android changes landed in an earlier commit, they were silently missed.
**Bug 2: Skipped runs counted as 'deployed'**
A workflow run where deploy-playstore was skipped (android=false) has status=success, so it was treated as a successful deploy. Now the code queries each run's job results and only trusts a run where the 'Build & Deploy to Play Store' job's own conclusion=success.
**Bug 3: Narrow fallback when SHA unknown**
When LAST_DEPLOYED_SHA could not be determined the workflow diffed HEAD~1..HEAD — potentially missing many commits. Now it defaults to android=true / linux=true (deploy everything) as the safe fallback.
Additional changes:
- ::error:: / ::warning:: / ::notice:: annotations so skip/failure reasons surface in the Actions UI.
- scripts/verify_playstore_deploy.py: new post-deploy check that queries the internal track and fails if the latest version code is more than 1 hour old. (Version codes are Unix timestamps set by ci/main.go's PublishAndroid.) Catches silent deploy failures the upload API did not reject.
- scripts/test_verify_playstore_deploy.py: 5 unit tests for the verify script (all pass).
Co-authored-by: Thomas SharedInbox <sharedinbox@thomas-guettler.de>
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/364
## Summary
- The \`Detect Changed Files\` step in \`deploy.yml\` previously set \`android=false\` / \`linux=false\` silently, leaving downstream jobs showing only "skipped" in CI with no visible cause
- Now each decision emits a clear one-liner in the step log:
- \`Android deploy: SKIPPED (no android-relevant files changed)\`
- \`Android deploy: TRIGGERED (android-relevant files changed)\`
- \`Linux deploy: SKIPPED (no linux-relevant files changed)\`
- or \`HEAD <sha> already successfully deployed — skipping all deploy jobs\`
- The skip reason is visible in the \`check-changes\` job output, which is the job that makes the decision
Closes#353
## Test plan
- [ ] Trigger the deploy workflow on a commit that only touches CI/docs files — \`check-changes\` step log should show "Android deploy: SKIPPED (no android-relevant files changed)"
- [ ] Trigger the deploy workflow on a commit touching \`lib/\` — log should show "Android deploy: TRIGGERED"
- [ ] Trigger a second run on the same commit — log should show "already successfully deployed — skipping all deploy jobs"
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Thomas SharedInbox <sharedinbox@thomas-guettler.de>
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/357
Three unguarded blocking calls caused CI to hang until the 60-min timeout:
- dagger query prune steps had no timeout; || true only catches errors, not hangs
- docker info (added in d905cd6) had no timeout if Docker socket is unresponsive
- until portfile loop in check-dagger spun forever if otel-receiver.py crashed
Fixes: timeout 120 on all dagger query prune calls, timeout 30 on docker info,
and a kill -0 process-alive guard on the portfile until loop with fallback.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Switch from the bespoke 1136-line Python orchestrator to the community
agentloop tool (https://github.com/guettli/agentloop). The new tool
handles the issue → agent → PR pipeline via a label state machine using
loop/plan and loop/code labels, running every 5 minutes via cron.
Removes: scripts/agent_loop.py, scripts/test_agent_loop.py
Removes: .forgejo/workflows/monitor.yml (no heartbeat concept in agentloop)
Updates: AGENTS.md to document the new loop/ label workflow
agentloop config lives in ~/agentloop/loop/sharedinbox/ on the host.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Forgejo's API returns head_sha=null in workflow run objects; the correct
field is commit_sha. The skip-check always got None, so every hourly
schedule triggered a full redeploy of the same commit.
Move Android Firebase instrumented tests out of deploy.yml into a new
firebase-tests.yml workflow that runs once per day (3 AM UTC) and only
when Firebase-relevant files changed in the last 24 hours. On failure,
the workflow automatically creates a Forgejo issue labelled "Ready" with
instructions to find the root cause and fix it.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace `task website-deploy` (which calls `hugo` directly and fails
because Hugo is not installed on the CI runner) with the Dagger-based
`task publish-website`, matching the pattern used by other jobs in
deploy.yml. Also adds Dagger remote engine setup, runner tool checks,
SSH_KNOWN_HOSTS secret, a timeout, and TLS credential cleanup.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
- The hourly `deploy.yml` schedule re-deployed the same commit repeatedly because it always diffed `HEAD~1..HEAD` — once a commit touching `lib/`/`pubspec.*` became HEAD, every hourly tick would detect "android changes" and deploy again.
- Fix: at the start of the `check-changes` job, query the Forgejo workflow runs API for the last successful `deploy.yml` run. If its `head_sha` matches current HEAD, output `android=false` / `linux=false` immediately, skipping all downstream jobs.
- `workflow_dispatch` bypasses this check (always deploys), matching the existing behaviour.
## Test plan
- [ ] Verify the `check-changes` job exits early on the next scheduled run after a successful deploy of the same commit
- [ ] Verify a new commit still triggers deployment normally
- [ ] Verify `workflow_dispatch` still deploys unconditionally
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Thomas SharedInbox <sharedinbox@thomas-guettler.de>
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/265
Adds a Renovate() Dagger function using the forgejo platform and a
.forgejo/workflows/renovate.yml workflow triggered at 06:00 UTC daily.
Uses RENOVATE_FORGEJO_TOKEN secret; no dedicated Renovate service account needed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Track a heartbeat timestamp in ~/.sharedinbox-agent-heartbeat at the
start of each _run_loop() invocation so we can tell when it last ran.
- Add `agent_loop.py monitor` subcommand that exits 1 with a WARNING
message if the heartbeat is missing, corrupted, or older than 2 hours.
- Add .forgejo/workflows/monitor.yml scheduled workflow that runs the
monitor check every 2 hours on the self-hosted runner; a CI failure
serves as the warning when the loop is stalled.
- Add 7 unit tests covering all monitor / heartbeat scenarios.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- agent_loop.py: create log dir with mode 0700 and enforce it on
existing dirs; open log files with mode 0600; chmod state file
to 0600 after every write. Prevents other local processes from
reading agent output (which may contain credential paths) or
tampering with the state file's pid field.
- ci/main.go (TestAndroidFirebase): replace
echo "$FIREBASE_SA_KEY" > /tmp/key.json
with bash process substitution
--key-file=<(echo "$FIREBASE_SA_KEY")
The key is now passed via a file descriptor — it never touches
disk, so it cannot be stranded by a failed gcloud auth call or
snapshotted into the Dagger layer cache.
- ci.yml / deploy.yml: add "Cleanup TLS credentials" step
(if: always()) at the end of every job that calls
setup_dagger_remote.sh. Removes /tmp/dagger-tls,
/tmp/stunnel-dagger.conf, /tmp/stunnel.pid from the self-hosted
runner after each job, so client certs do not accumulate between
job runs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- agent_loop.py: agents now create an `issue-N-fix` branch and open a PR;
the loop discovers the PR via `fgj pr list`, tracks its CI run, squash-merges
on green, and falls back to the global-CI path if no PR exists (backward compat).
Adds `_find_pr_for_branch`, `_latest_ci_run_for_branch`, `_merge_pr` helpers.
- .forgejo/workflows/ci.yml: strip to the single fast `check` job only
(removes build-linux, deploy-playstore, publish-website).
- .forgejo/workflows/deploy.yml (new, replaces android-emulator-tests.yml):
scheduled hourly + workflow_dispatch; runs firebase tests, Play Store deploy,
Linux build/deploy, website publish; on completion sets CI/Full-Pass or
CI/Full-Fail label on the repo's DEPLOY_HEALTH_ISSUE tracking issue.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace curl-based install of dagger/task with a hard check that
fails immediately if any tool is missing from the runner image,
pointing to .forgejo/Dockerfile as the fix location.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Based on ghcr.io/catthehacker/ubuntu:go-24.04 with stunnel4,
netcat-openbsd, dagger v0.20.8 and task v3.48.0 baked in so
nothing is downloaded during CI runs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace apt-get install with a hard check — if the packages are missing
the job fails immediately with a clear error. Avoids flaky failures when
archive.ubuntu.com is unreachable.
Install once on the runner: sudo apt-get install -y stunnel4 netcat-openbsd
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements issue #132. Builds debug app APK + androidTest APK via Dagger,
then runs them on Firebase Test Lab using the FIREBASE_TEST_LAB_SERVICE_ACCOUNT_KEY
secret and FIREBASE_PROJECT_ID variable.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>