123 Commits
Author SHA1 Message Date
0297701829 ci: automate dev container build via devcontainer.json + workflow (#553)
Closes #552

## Summary
- Add `.devcontainer/devcontainer.json` pointing at `../Dockerfile.dev` so VS Code / Codespaces / any devcontainer-aware tool can build the dev environment directly from source.
- Add `.forgejo/workflows/publish-dev-container.yml` that rebuilds `Dockerfile.dev` and pushes it to `codeberg.org/guettli/sharedinbox-dev` whenever `Dockerfile.dev`, the devcontainer config, or the workflow itself changes on `main`. The image is tagged both `:latest` and with the short commit SHA for pinnable references.
- The workflow uses the built-in `FORGEJO_TOKEN` to log in to Codeberg's container registry — no extra secrets required.

## Notes
- No existing references to `ghcr.io/guettli/sharedinbox-dev` were found in the repo, so issue step 3 (updating image references) is a no-op here.
- `workflow_dispatch` is also enabled so the image can be rebuilt manually if needed.

## Verification
- `python3 -c "import json; json.load(...)"` parses the devcontainer config.
- `python3 -c "import yaml; yaml.safe_load(...)"` parses the workflow.
- Triggers (paths filter) match the source files the issue identifies as drift risks.

Co-authored-by: Thomas Güttler <tilldu@googlemail.com>
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/553
2026-06-09 21:31:45 +02:00
ee238b85c7 fix(ci): set loop/code label on Firebase test failure issues (#551)
Closes #550

## Summary

When Firebase instrumented tests fail in the nightly run, the workflow opens a tracking issue. It currently tags it with the legacy `Ready` label, which is not part of the current agent loop. Switch the label to `loop/code` so the coding agent picks it up automatically and the error gets fixed.

## Change

- `.forgejo/workflows/firebase-tests.yml`: set `loop/code` instead of `Ready` on the created failure issue.

## Test plan

- [ ] Wait for next scheduled (or manually dispatched) Firebase test failure and confirm the created issue carries the `loop/code` label.

Co-authored-by: guettlibot <tilldu@googlemail.com>
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/551
2026-06-09 16:08:19 +02:00
Bot of Thomas Güttler a227f8607c fix(ci): use endpoints that exist in Forgejo for wait-time + LAST_DEPLOYED_SHA (#529) 2026-06-07 14:02:01 +02:00
Bot of Thomas Güttler 5db5d957ab fix(ci): use /actions/runs endpoint in remaining wait-time steps (#524) 2026-06-07 06:59:00 +02:00
Bot of Thomas Güttler 0dd1d7232b fix(ci): use /actions/runs endpoint in deploy.yml wait-time steps (#522) 2026-06-07 06:33:57 +02:00
Bot of Thomas Güttler e4cc92867e ci(website): add change detection to skip unconditional hourly deploys (#515) 2026-06-07 05:04:58 +02:00
Bot of Thomas Güttler e2bb299300 fix(ci): exclude chaos_monkey_test from regular CI (#518) 2026-06-07 04:24:10 +02:00
Bot of Thomas Güttler d55b316d4c ci: add concurrency cancel-in-progress to ci.yml (#516) 2026-06-07 02:40:13 +02:00
Bot of Thomas Güttler f7fd30da15 feat(ci): add Print runner wait time step to all workflow jobs (#517) 2026-06-07 02:40:08 +02:00
913f9e8855 fix: prevent duplicate CI runs on pull request pushes (#490)
## Summary

- The CI workflow used `on: [push, pull_request]`, which fires **two** runs whenever a commit is pushed to a branch with an open PR — one for the `push` event and one for the `pull_request` event.
- Scoped the `push` trigger to `branches: [main]` only. Feature-branch pushes now trigger only via `pull_request`; direct pushes to `main` (merge commits) still trigger via `push`.

## Test plan

- [ ] Open a PR and push a new commit — verify only one CI run appears, not two
- [ ] Merge a PR to `main` — verify CI still runs via the `push` trigger

Closes #483

Co-authored-by: Thomas SharedInbox <sharedinbox@thomas-guettler.de>
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/490
2026-06-06 21:43:46 +02:00
6a60c8d73b fix: resolve dart analyze failures in chaos_monkey_test.dart (#458)
## Summary

Fixes CI failures introduced by PR #455 (chaos monkey backend test).

The `dart analyze --fatal-infos` step in CI was failing because `test/backend/chaos_monkey_test.dart` had:

- **`avoid_print`** (5 instances): replaced `print(...)` with `stdout.writeln(...)` — `dart:io` is already imported
- **`avoid_redundant_argument_values`**: removed redundant `''` from `_env('CHAOS_SEED', '')` since `''` is the parameter default
- **`dart format`**: applied formatter fixes (trailing commas, line wrapping for long `connectToServer` calls)

## Verification

```
$ nix develop --command bash -c "fvm dart analyze --fatal-infos"
Analyzing 456...
No issues found!

$ nix develop --command bash -c "fvm dart format --output=none --set-exit-if-changed test/backend/chaos_monkey_test.dart"
Formatted 1 file (0 changed) in 0.01 seconds.
```

Closes #456

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Thomas SharedInbox <sharedinbox@thomas-guettler.de>
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/458
2026-06-06 05:29:40 +02:00
8718339b4e ci: add timeouts to all CI/CD jobs, Dagger tasks, and runner scripts (#432)
Closes #415

## Summary

- Adds missing `timeout-minutes` to `ci.yml` (`check` job, 60 min) and `windows-nightly.yml` (90 min, ready for when the Windows runner is registered)
- Wraps `ssh-keyscan` and `ssh -f -N -L` tunnel creation in `setup_dagger_remote.sh` with `timeout 30`; emits a `::warning::` annotation when either takes more than 10 s
- Adds `timeout --kill-after=10 <N>` to all bare `dagger call` invocations in `Taskfile.yml`: 600 s for test/query tasks, 1800 s for build/deploy tasks, 60 s for `ci-graph`; `stalwart` and `check-dagger` (already protected) left untouched
- Adds `timeout --kill-after=10 2400` per attempt in `run_firebase_test.sh`; emits `::warning::` on exit 124 instead of silently retrying

## Test plan

- CI passes on this PR (the `check` job now has `timeout-minutes: 60` and will self-enforce)
- All `dagger call` lines in `Taskfile.yml` now have a `timeout` prefix (visible in the diff)
- `setup_dagger_remote.sh` logic is unchanged — only the two network calls are wrapped

Co-authored-by: Thomas SharedInbox <sharedinbox@thomas-guettler.de>
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/432
2026-06-05 11:49:30 +02:00
29c2c7e96c fix: three deploy failures from run #1424 (#369)
## Summary

Fixes three distinct failures from CI deploy run #1424 and concurrent website update failures.

- **Play Store job**: `pip install google-auth requests` fails on Ubuntu 24.04 with PEP 668. Fixed by using `python3 -m venv` for an isolated install.
- **SSH key error (APK, Linux, website jobs)**: All SSH/rsync steps fail with `Load key "/root/.ssh/id_ed25519": error in libcrypto` inside the Dagger Alpine 3.21 container. This is the first time these jobs actually ran (all previous deploy runs had every job skipped). Two fixes:
  - `setup_dagger_remote.sh`: `export_secret` was appending an extra trailing newline to values (like SSH private keys) that already end with `\n`. Now only adds one when needed.
  - `ci/main.go` `Deployer`: mounts the key at a `.raw` path, strips Windows-style CRLF endings with `tr -d '\r'`, then writes the normalised key to `id_ed25519`. CRLF bytes cause "error in libcrypto" in Alpine's LibreSSL-backed openssh.

## Test plan
- [ ] Deploy run triggers after merge; all three deploy jobs complete
- [ ] Play Store verification step passes
- [ ] SSH commands in Alpine load the key without `error in libcrypto`

Closes #366

Co-authored-by: Thomas SharedInbox <sharedinbox@thomas-guettler.de>
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/369
2026-06-03 21:23:13 +02:00
6a097976d3 fix: correct LAST_DEPLOYED_SHA detection so Play Store always gets updated (#364)
Closes #361

Three bugs in the hourly deploy workflow's change-detection logic caused the Play Store to silently fall behind whenever a deploy failed or all-android jobs were skipped.

**Bug 1 (primary): commit_sha → head_sha**
Forgejo's API returns head_sha; commit_sha was always None. This meant LAST_DEPLOYED_SHA was always empty, so the diff fell back to HEAD~1..HEAD — only the single most recent commit was inspected. If android changes landed in an earlier commit, they were silently missed.

**Bug 2: Skipped runs counted as 'deployed'**
A workflow run where deploy-playstore was skipped (android=false) has status=success, so it was treated as a successful deploy. Now the code queries each run's job results and only trusts a run where the 'Build & Deploy to Play Store' job's own conclusion=success.

**Bug 3: Narrow fallback when SHA unknown**
When LAST_DEPLOYED_SHA could not be determined the workflow diffed HEAD~1..HEAD — potentially missing many commits. Now it defaults to android=true / linux=true (deploy everything) as the safe fallback.

Additional changes:
- ::error:: / ::warning:: / ::notice:: annotations so skip/failure reasons surface in the Actions UI.
- scripts/verify_playstore_deploy.py: new post-deploy check that queries the internal track and fails if the latest version code is more than 1 hour old. (Version codes are Unix timestamps set by ci/main.go's PublishAndroid.) Catches silent deploy failures the upload API did not reject.
- scripts/test_verify_playstore_deploy.py: 5 unit tests for the verify script (all pass).

Co-authored-by: Thomas SharedInbox <sharedinbox@thomas-guettler.de>
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/364
2026-06-03 19:26:00 +02:00
Thomas SharedInbox 761378f583 Dockerfile. 2026-06-03 17:30:30 +02:00
9605c5e3b7 ci: print explicit reason when deploy jobs are skipped (#357)
## Summary

- The \`Detect Changed Files\` step in \`deploy.yml\` previously set \`android=false\` / \`linux=false\` silently, leaving downstream jobs showing only "skipped" in CI with no visible cause
- Now each decision emits a clear one-liner in the step log:
  - \`Android deploy: SKIPPED (no android-relevant files changed)\`
  - \`Android deploy: TRIGGERED (android-relevant files changed)\`
  - \`Linux deploy: SKIPPED (no linux-relevant files changed)\`
  - or \`HEAD <sha> already successfully deployed — skipping all deploy jobs\`
- The skip reason is visible in the \`check-changes\` job output, which is the job that makes the decision

Closes #353

## Test plan

- [ ] Trigger the deploy workflow on a commit that only touches CI/docs files — \`check-changes\` step log should show "Android deploy: SKIPPED (no android-relevant files changed)"
- [ ] Trigger the deploy workflow on a commit touching \`lib/\` — log should show "Android deploy: TRIGGERED"
- [ ] Trigger a second run on the same commit — log should show "already successfully deployed — skipping all deploy jobs"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Thomas SharedInbox <sharedinbox@thomas-guettler.de>
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/357
2026-06-03 13:27:29 +02:00
Bot of Thomas Güttler 2747c4e63d chore: migrate CI secrets from Forgejo to SOPS (#354) 2026-06-03 06:37:07 +02:00
Thomas Güttler b0a09939c9 chore: migrate all workflows to SSH-based Dagger engine and remove stunnel legacy 2026-06-02 17:40:35 +02:00
Thomas Güttler 3520f161e3 fix: update website workflow with correct Dagger setup and SOPS_AGE_KEY 2026-06-02 17:00:54 +02:00
Thomas Güttler 9744fe1379 debug: extremely simplify ci.yml 2026-06-02 13:22:05 +02:00
Thomas Güttler 43eafbd4c2 debug: simplify workflow triggers to fix parsing error 2026-06-02 13:18:28 +02:00
Thomas Güttler 180035ec55 fix: re-apply ci.yml with clean format 2026-06-02 12:50:39 +02:00
Thomas Güttler ec3ebfa4a3 fix: update CI workflow for SSH/SOPS and SOPS_AGE_KEY 2026-06-02 12:44:35 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 ea5d119706 fix: add timeouts to dagger query, docker info, and portfile loop (#347)
Three unguarded blocking calls caused CI to hang until the 60-min timeout:
- dagger query prune steps had no timeout; || true only catches errors, not hangs
- docker info (added in d905cd6) had no timeout if Docker socket is unresponsive
- until portfile loop in check-dagger spun forever if otel-receiver.py crashed

Fixes: timeout 120 on all dagger query prune calls, timeout 30 on docker info,
and a kill -0 process-alive guard on the portfile until loop with fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-01 21:43:07 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 968db75c69 feat: replace agent_loop.py with agentloop
Switch from the bespoke 1136-line Python orchestrator to the community
agentloop tool (https://github.com/guettli/agentloop). The new tool
handles the issue → agent → PR pipeline via a label state machine using
loop/plan and loop/code labels, running every 5 minutes via cron.

Removes: scripts/agent_loop.py, scripts/test_agent_loop.py
Removes: .forgejo/workflows/monitor.yml (no heartbeat concept in agentloop)
Updates: AGENTS.md to document the new loop/ label workflow

agentloop config lives in ~/agentloop/loop/sharedinbox/ on the host.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 09:20:48 +02:00
Bot of Thomas Güttler 91083218d4 fix: diff from last deployed SHA to catch all changes since last deploy (#320) (#332) 2026-05-29 17:34:21 +02:00
Bot of Thomas Güttler adc4eb6f6d feat: remove publish-website from deploy.yml, schedule website.yml hourly (#325) (#330) 2026-05-29 12:53:18 +02:00
Bot of Thomas Güttler dbb29fb76a fix: rename workflow to Update Website and guard verify step (#282) (#283) 2026-05-27 20:00:39 +02:00
Bot of Thomas Güttler 2f975829e5 feat: auto-merge safe Renovate PRs via CI (#277) (#284) 2026-05-27 09:37:15 +02:00
Thomas SharedInbox a8d6ec5861 fix: use commit_sha instead of head_sha to detect already-deployed commits
Forgejo's API returns head_sha=null in workflow run objects; the correct
field is commit_sha. The skip-check always got None, so every hourly
schedule triggered a full redeploy of the same commit.
2026-05-26 15:22:23 +02:00
Thomas SharedInbox e22c4aa88d fix: use Dagger for website deploy and record Renovate Bot completion (#267, #268) 2026-05-26 15:09:59 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 720c54433a feat: run Firebase tests once daily via dedicated workflow (#272)
Move Android Firebase instrumented tests out of deploy.yml into a new
firebase-tests.yml workflow that runs once per day (3 AM UTC) and only
when Firebase-relevant files changed in the last 24 hours. On failure,
the workflow automatically creates a Forgejo issue labelled "Ready" with
instructions to find the root cause and fix it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 08:48:10 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 2747ff0dca fix: use Dagger for website deploy instead of bare hugo call (#267)
Replace `task website-deploy` (which calls `hugo` directly and fails
because Hugo is not installed on the CI runner) with the Dagger-based
`task publish-website`, matching the pattern used by other jobs in
deploy.yml. Also adds Dagger remote engine setup, runner tool checks,
SSH_KNOWN_HOSTS secret, a timeout, and TLS credential cleanup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 08:01:37 +02:00
c97e3d505f fix: skip deploy when HEAD already successfully deployed (#264) (#265)
## Summary

- The hourly `deploy.yml` schedule re-deployed the same commit repeatedly because it always diffed `HEAD~1..HEAD` — once a commit touching `lib/`/`pubspec.*` became HEAD, every hourly tick would detect "android changes" and deploy again.
- Fix: at the start of the `check-changes` job, query the Forgejo workflow runs API for the last successful `deploy.yml` run. If its `head_sha` matches current HEAD, output `android=false` / `linux=false` immediately, skipping all downstream jobs.
- `workflow_dispatch` bypasses this check (always deploys), matching the existing behaviour.

## Test plan

- [ ] Verify the `check-changes` job exits early on the next scheduled run after a successful deploy of the same commit
- [ ] Verify a new commit still triggers deployment normally
- [ ] Verify `workflow_dispatch` still deploys unconditionally

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Thomas SharedInbox <sharedinbox@thomas-guettler.de>
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/265
2026-05-26 07:35:18 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 2bb7ac11df feat: add runner tools check and LOG_LEVEL to Renovate Bot (#257)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 06:24:47 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 4ada3798b6 feat: run Renovate via Dagger on daily schedule (#257, #216)
Adds a Renovate() Dagger function using the forgejo platform and a
.forgejo/workflows/renovate.yml workflow triggered at 06:00 UTC daily.
Uses RENOVATE_FORGEJO_TOKEN secret; no dedicated Renovate service account needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 21:26:44 +02:00
Bot of Thomas Güttler a7783d46cf fix: disable Save button when no password available; fix changelog fetch-depth (#246, #229) (#248) 2026-05-25 14:47:25 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 06df3ee200 feat: monitor agent loop health every 2 hours (#217)
- Track a heartbeat timestamp in ~/.sharedinbox-agent-heartbeat at the
  start of each _run_loop() invocation so we can tell when it last ran.
- Add `agent_loop.py monitor` subcommand that exits 1 with a WARNING
  message if the heartbeat is missing, corrupted, or older than 2 hours.
- Add .forgejo/workflows/monitor.yml scheduled workflow that runs the
  monitor check every 2 hours on the self-hosted runner; a CI failure
  serves as the warning when the loop is stalled.
- Add 7 unit tests covering all monitor / heartbeat scenarios.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 12:48:45 +02:00
Bot of Thomas Güttler 32ba916cbf fix: trigger deploy on script changes, add changelog dep, deepen fetch (#228) (#233) 2026-05-24 21:05:10 +02:00
Thomas SharedInbox b2c11e0c63 Revert "feat: keep secrets in sync via age-encrypted master key (#208) (#223)"
This reverts commit 96b1660b59.
2026-05-24 18:39:23 +02:00
Bot of Thomas Güttler 96b1660b59 feat: keep secrets in sync via age-encrypted master key (#208) (#223) 2026-05-24 16:35:10 +02:00
Bot of Thomas Güttler 37eca207c6 fix: pin SSH host key via known_hosts instead of StrictHostKeyChecking=no (#161) (#181) 2026-05-24 13:00:04 +02:00
Bot of Thomas Güttler 30bcc8a314 fix: skip CI jobs when unrelated files change (#144) (#207) 2026-05-24 08:30:10 +02:00
Bot of Thomas Güttler 5c38357033 fix: limit dagger-data volume growth by pruning named caches (#193) (#197) 2026-05-24 06:00:14 +02:00
Bot of Thomas Güttler 71ccf24d0c fix: survive permanently broken path_provider channel on Android (#192) (#194) 2026-05-24 03:50:07 +02:00
Bot of Thomas Güttler 833e8d49b0 fix: remove continue-on-error from CI workflows (#172) (#189) 2026-05-23 19:05:08 +02:00
Bot of Thomas Güttler 6adba9b001 perf: parallelize APK deploy and reduce fetch-depth in deploy.yml (#171) (#188) 2026-05-23 18:55:08 +02:00
Bot of Thomas Güttler 1b1f9788fd docs: explain why continue-on-error is intentional on deploy steps (#154) (#177) 2026-05-23 15:30:14 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 b6a2f91820 security: fix log/state file permissions, Firebase key on disk, TLS cleanup
- agent_loop.py: create log dir with mode 0700 and enforce it on
  existing dirs; open log files with mode 0600; chmod state file
  to 0600 after every write. Prevents other local processes from
  reading agent output (which may contain credential paths) or
  tampering with the state file's pid field.

- ci/main.go (TestAndroidFirebase): replace
    echo "$FIREBASE_SA_KEY" > /tmp/key.json
  with bash process substitution
    --key-file=<(echo "$FIREBASE_SA_KEY")
  The key is now passed via a file descriptor — it never touches
  disk, so it cannot be stranded by a failed gcloud auth call or
  snapshotted into the Dagger layer cache.

- ci.yml / deploy.yml: add "Cleanup TLS credentials" step
  (if: always()) at the end of every job that calls
  setup_dagger_remote.sh. Removes /tmp/dagger-tls,
  /tmp/stunnel-dagger.conf, /tmp/stunnel.pid from the self-hosted
  runner after each job, so client certs do not accumulate between
  job runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-23 10:54:53 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 9cd18ba70e feat: agent loop uses PRs; ci.yml fast-only; hourly deploy workflow (#156)
- agent_loop.py: agents now create an `issue-N-fix` branch and open a PR;
  the loop discovers the PR via `fgj pr list`, tracks its CI run, squash-merges
  on green, and falls back to the global-CI path if no PR exists (backward compat).
  Adds `_find_pr_for_branch`, `_latest_ci_run_for_branch`, `_merge_pr` helpers.

- .forgejo/workflows/ci.yml: strip to the single fast `check` job only
  (removes build-linux, deploy-playstore, publish-website).

- .forgejo/workflows/deploy.yml (new, replaces android-emulator-tests.yml):
  scheduled hourly + workflow_dispatch; runs firebase tests, Play Store deploy,
  Linux build/deploy, website publish; on completion sets CI/Full-Pass or
  CI/Full-Fail label on the repo's DEPLOY_HEALTH_ISSUE tracking issue.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 22:05:09 +02:00