Commit Graph
128 Commits
Author SHA1 Message Date
Thomas Güttler ba21b802eb fix: use _EXPERIMENTAL_DAGGER_RUNNER_HOST for Dagger SSH redirection 2026-06-02 13:31:11 +02:00
Thomas Güttler 7974c28102 fix: use absolute path for dagger in ssh wrapper 2026-06-02 13:23:41 +02:00
Thomas Güttler e5c5dc9db8 fix: add IdentitiesOnly=yes to SSH config for Dagger 2026-06-02 13:20:20 +02:00
Thomas Güttler 6703ffd69b fix: use explicit ssh wrapper for dagger commands 2026-06-02 13:19:16 +02:00
Thomas Güttler ee1fccf340 fix: use _EXPERIMENTAL_DAGGER_RUNNER_HOST for SSH redirection 2026-06-02 13:16:33 +02:00
Thomas Güttler 5757176937 debug: add SSH connection test to setup_dagger_remote.sh 2026-06-02 12:51:41 +02:00
Thomas Güttler 8ee411d1c8 fix: use --output-type json for SOPS decryption 2026-06-02 12:45:34 +02:00
Thomas Güttler 1e2d1b6063 chore: migrate to SOPS and SSH for Dagger engine access 2026-06-02 11:10:29 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 ea5d119706 fix: add timeouts to dagger query, docker info, and portfile loop (#347)
Three unguarded blocking calls caused CI to hang until the 60-min timeout:
- dagger query prune steps had no timeout; || true only catches errors, not hangs
- docker info (added in d905cd6) had no timeout if Docker socket is unresponsive
- until portfile loop in check-dagger spun forever if otel-receiver.py crashed

Fixes: timeout 120 on all dagger query prune calls, timeout 30 on docker info,
and a kill -0 process-alive guard on the portfile until loop with fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-01 21:43:07 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 968db75c69 feat: replace agent_loop.py with agentloop
Switch from the bespoke 1136-line Python orchestrator to the community
agentloop tool (https://github.com/guettli/agentloop). The new tool
handles the issue → agent → PR pipeline via a label state machine using
loop/plan and loop/code labels, running every 5 minutes via cron.

Removes: scripts/agent_loop.py, scripts/test_agent_loop.py
Removes: .forgejo/workflows/monitor.yml (no heartbeat concept in agentloop)
Updates: AGENTS.md to document the new loop/ label workflow

agentloop config lives in ~/agentloop/loop/sharedinbox/ on the host.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 09:20:48 +02:00
Bot of Thomas Güttler d905cd653f fix: check Docker availability before falling back to local Dagger engine (#329) (#333) 2026-05-29 23:19:14 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 e21cde0a3c fix: allow forgejo-actions as issue author in agent loop
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 21:52:56 +02:00
Bot of Thomas Güttler 50a6678ec2 feat: reimplement user preferences, archive, configurable navigation (#315) (#324) 2026-05-29 19:08:12 +02:00
Bot of Thomas Güttler adc4eb6f6d feat: remove publish-website from deploy.yml, schedule website.yml hourly (#325) (#330) 2026-05-29 12:53:18 +02:00
Bot of Thomas Güttler c45775be92 fix: move sync health report to own row below each account (#311) (#322) 2026-05-28 06:53:11 +02:00
Bot of Thomas Güttler a5928c1aa6 fix: add _tea_get and merged-PR catch-up to close issues on merge (#305) (#310) 2026-05-28 00:07:13 +02:00
Bot of Thomas Güttler 7f3cd43d6e feat: add --dangerously-skip-permissions to claude --resume output (#304) (#309) 2026-05-27 23:48:12 +02:00
Bot of Thomas Güttler 41550eb4b5 feat: configurable menu bar position for mailbox view (#298) (#303) 2026-05-27 22:07:12 +02:00
Bot of Thomas Güttler 5ddfe68467 feat: catch up Renovate PRs with passing CI in agent loop (#289) (#293) 2026-05-27 20:09:13 +02:00
Bot of Thomas Güttler c0dd13be5d feat: align single and multi-mail actions, add archive (#287) (#291) 2026-05-27 19:36:13 +02:00
Bot of Thomas Güttler 73bbfd2694 fix: add explicit note that app settings are never uploaded (#280) (#281) 2026-05-27 08:25:20 +02:00
Thomas SharedInbox 49e6b335d9 better err msg in agent-loop. 2026-05-27 08:14:42 +02:00
Bot of Thomas Güttler f57a8c502d feat: syncLog add Copy button, stack trace, isPermanent (#266) (#269) 2026-05-26 07:55:07 +02:00
Bot of Thomas Güttler 8709e9f38d feat: add Locale, Text Scale, DB Schema Version, Device Model to About page (#258) (#263) 2026-05-25 22:18:09 +02:00
Bot of Thomas Güttler bb475a2350 fix: auto-resolve merge failures instead of asking for manual merge (#253) (#256) 2026-05-25 19:38:07 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 06df3ee200 feat: monitor agent loop health every 2 hours (#217)
- Track a heartbeat timestamp in ~/.sharedinbox-agent-heartbeat at the
  start of each _run_loop() invocation so we can tell when it last ran.
- Add `agent_loop.py monitor` subcommand that exits 1 with a WARNING
  message if the heartbeat is missing, corrupted, or older than 2 hours.
- Add .forgejo/workflows/monitor.yml scheduled workflow that runs the
  monitor check every 2 hours on the self-hosted runner; a CI failure
  serves as the warning when the loop is stalled.
- Add 7 unit tests covering all monitor / heartbeat scenarios.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 12:48:45 +02:00
27bef3356e fix: skip catch-up merge retry when issue has State/Question (#239) (#242)
When a catch-up PR merge fails (PR stays open after the merge command), the loop sets the issue to State/Question and comments on it. But on the next cron tick the same PR is still open with passing CI, so it tries again — spamming the issue with identical comments every minute.

Fix: before attempting a catch-up merge, fetch the issue's current labels via `_get_issue_labels()`. If `State/Question` is already set, skip the PR entirely.

Closes #239

Co-authored-by: Thomas SharedInbox <sharedinbox@thomas-guettler.de>
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/242
2026-05-25 09:21:23 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 f4a052bedc feat: add State/ToPlan planning phase to agent loop
Issues labelled State/ToPlan are now picked up by a dedicated planning
agent before any implementation happens. The agent posts a plan as an
issue comment, then the loop transitions the label to State/Planned and
leaves a resume command in a follow-up comment. A human reviews the plan
and manually promotes the issue to State/Ready to trigger implementation.

Planning agents run at higher priority than Ready issues.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 18:56:46 +02:00
Thomas SharedInbox b2c11e0c63 Revert "feat: keep secrets in sync via age-encrypted master key (#208) (#223)"
This reverts commit 96b1660b59.
2026-05-24 18:39:23 +02:00
Bot of Thomas Güttler 96b1660b59 feat: keep secrets in sync via age-encrypted master key (#208) (#223) 2026-05-24 16:35:10 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 d9b8748631 fix: filter _latest_main_ci_run by workflow_id == ci.yml
Forgejo reports deploy.yml (scheduled/dispatch) runs with event=push
and prettyref=main, identical to ci.yml push runs. The event-only
filter was insufficient — adding workflow_id == "ci.yml" prevents
deploy.yml runs from blocking or triggering false CI fix agents.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 15:07:00 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 77e581299d fix: filter out schedule/deploy workflow runs in CI checks
_latest_main_ci_run() was using event != pull_request which still
matched deploy.yml schedule runs when their prettyref == "main",
blocking the loop from picking up new issues.

_latest_ci_run_for_branch() had the same issue: the else branch matched
any non-pull_request event including schedule runs.

Both functions now explicitly filter for event == "push" only.

Tests updated: rename _latest_ci_run → _latest_main_ci_run, mock
_open_issue_prs to prevent real API calls in unit tests, and update
_find_pr_for_branch side_effect to reflect the upstream post-merge
PR-still-open verification check.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 14:08:13 +02:00
Bot of Thomas Güttler 37eca207c6 fix: pin SSH host key via known_hosts instead of StrictHostKeyChecking=no (#161) (#181) 2026-05-24 13:00:04 +02:00
Bot of Thomas Güttler 5925cee4f2 fix: show git hash as clickable link above stacktrace (#201) (#211) 2026-05-24 12:56:27 +02:00
Bot of Thomas Güttler a8603edfc3 fix: verify PID belongs to claude before SIGKILL (#160) (#163) 2026-05-24 12:55:08 +02:00
Bot of Thomas Güttler 0293cb5845 fix: stop retrying on MissingPluginException from flutter_secure_storage (#200) (#209) 2026-05-24 08:50:06 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 c517f604e0 test: update deploy_playstore tests for requests-based transport
The previous tests patched google_auth_httplib2 and googleapiclient which
no longer exist in the new implementation. Rewrite to mock AuthorizedSession
and _upload_aab_resumable, covering the same scenarios: happy path, retry
on transient errors, backoff delays, and exhausted attempts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 07:40:17 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 7d393ec818 fix: switch Play Store upload from httplib2 to requests
httplib2 treats 308 Resume Incomplete responses (used by Google's
resumable upload API) as redirects and raises RedirectMissingLocation
when the response lacks a Location header. Switch to
google.auth.transport.requests.AuthorizedSession + direct HTTP calls
so the upload uses the requests library, which handles 308 correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 07:32:22 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 80cde04d87 fix: retry AAB upload on RedirectMissingLocation with exponential backoff (#186)
Wrap the resumable bundle upload in a loop of up to _MAX_UPLOAD_ATTEMPTS (3)
attempts. On httplib2.error.RedirectMissingLocation, recreate MediaFileUpload
(resumable uploads cannot reuse the same object) and wait 10 s / 20 s before
retrying. After all attempts are exhausted, raise RuntimeError chained to the
last exception. Add tests covering the retry path, backoff delays, fresh
MediaFileUpload on each attempt, and exhaustion.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 04:59:05 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 fb6f2cca68 fix: add timeout and retries to Play Store upload (#185)
Switch deploy_playstore.py from requests/AuthorizedSession to the
googleapiclient.discovery client with google-auth-httplib2, so that
AuthorizedHttp(timeout=300) enforces a hard socket timeout on all
requests and num_retries=3 on every .execute() call enables automatic
retries for transient failures.

Update flake.nix and ci/main.go to install the new dependencies
(google-api-python-client, google-auth-httplib2, httplib2) instead of
the old google-auth + requests pair.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 04:38:36 +02:00
Bot of Thomas Güttler 71ccf24d0c fix: survive permanently broken path_provider channel on Android (#192) (#194) 2026-05-24 03:50:07 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 6e22683f5b fix(crash_screen): remove duplicate gitLine definition left by rebase conflict resolution
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-23 17:02:39 +02:00
Bot of Thomas Güttler 19d8d282ba fix: show UUID in agent-loop resume command (#152) (#176) 2026-05-23 15:20:08 +02:00
Bot of Thomas Güttler aa59dbb852 feat: show CI run link in 'CI passed' message (#151) (#174) 2026-05-23 15:05:07 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 5ad6599951 fix(agent_loop): match CI run to PR branch via event_payload, not head_branch
The Forgejo workflow_runs API has no head_branch field.  For pull_request
events the branch lives in event_payload["pull_request"]["head"]["ref"];
for push events it is in prettyref.  The old code used run.get("head_branch")
which always returned None, causing _latest_ci_run_for_branch to never find
the run and the loop to declare "no CI run after 15 min" and set the issue to
State/Question — even when CI had already passed.

Also fixes a pre-existing test mock that was missing the session_name kwarg.
Adds TestLatestCiRunForBranch covering both event types and the regression.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-23 13:36:21 +02:00
Thomas SharedInbox 55c15177d8 fix(publish-website): survive SSH failure in generate_build_history (#164)
The Dagger container running generate_build_history.py may not always
reach the deployment server (network constraints on the Dagger engine).
Rather than aborting the entire publish-website pipeline, log the SSH
verbose output (already added in the previous debug commit) and return
an empty file list so Hugo still builds and rsync still deploys the
site — just without updated build-history pages.

This unblocks the cron deploy that has been failing since c259d2da.
2026-05-23 12:17:58 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 54cd6623c4 debug(ci): add ssh -v to generate_build_history for exit-255 diagnosis
Temporary: print verbose SSH output on failure to identify why the
connection fails from inside the dagger container.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-23 12:13:26 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 565b6f8e33 fix(publish-website): add -i to ssh call in generate_build_history.py
All other ssh/scp calls in the dagger module use explicit -i /root/.ssh/id_ed25519.
This one was missing it, causing exit 255 inside the dagger container.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-23 12:02:30 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 b6a2f91820 security: fix log/state file permissions, Firebase key on disk, TLS cleanup
- agent_loop.py: create log dir with mode 0700 and enforce it on
  existing dirs; open log files with mode 0600; chmod state file
  to 0600 after every write. Prevents other local processes from
  reading agent output (which may contain credential paths) or
  tampering with the state file's pid field.

- ci/main.go (TestAndroidFirebase): replace
    echo "$FIREBASE_SA_KEY" > /tmp/key.json
  with bash process substitution
    --key-file=<(echo "$FIREBASE_SA_KEY")
  The key is now passed via a file descriptor — it never touches
  disk, so it cannot be stranded by a failed gcloud auth call or
  snapshotted into the Dagger layer cache.

- ci.yml / deploy.yml: add "Cleanup TLS credentials" step
  (if: always()) at the end of every job that calls
  setup_dagger_remote.sh. Removes /tmp/dagger-tls,
  /tmp/stunnel-dagger.conf, /tmp/stunnel.pid from the self-hosted
  runner after each job, so client certs do not accumulate between
  job runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-23 10:54:53 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 1a7b585dd4 fix(agent-loop): filter issues by author; comment when setting State/Question (#158)
- Only pick up issues created by guettli, guettlibot, or guettlibot2
  to prevent the loop from acting on external/bot issues.
- Post an explanatory comment on the issue whenever the loop sets
  State/Question (agent killed, no CI run, no push detected), so the
  reason is visible without digging through cron logs. Closes #158.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-23 10:04:44 +02:00