Commit Graph
122 Commits
Author SHA1 Message Date
Thomas Güttler 8ee411d1c8 fix: use --output-type json for SOPS decryption 2026-06-02 12:45:34 +02:00
Thomas Güttler 1e2d1b6063 chore: migrate to SOPS and SSH for Dagger engine access 2026-06-02 11:10:29 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 ea5d119706 fix: add timeouts to dagger query, docker info, and portfile loop (#347)
Three unguarded blocking calls caused CI to hang until the 60-min timeout:
- dagger query prune steps had no timeout; || true only catches errors, not hangs
- docker info (added in d905cd6) had no timeout if Docker socket is unresponsive
- until portfile loop in check-dagger spun forever if otel-receiver.py crashed

Fixes: timeout 120 on all dagger query prune calls, timeout 30 on docker info,
and a kill -0 process-alive guard on the portfile until loop with fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-01 21:43:07 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 968db75c69 feat: replace agent_loop.py with agentloop
Switch from the bespoke 1136-line Python orchestrator to the community
agentloop tool (https://github.com/guettli/agentloop). The new tool
handles the issue → agent → PR pipeline via a label state machine using
loop/plan and loop/code labels, running every 5 minutes via cron.

Removes: scripts/agent_loop.py, scripts/test_agent_loop.py
Removes: .forgejo/workflows/monitor.yml (no heartbeat concept in agentloop)
Updates: AGENTS.md to document the new loop/ label workflow

agentloop config lives in ~/agentloop/loop/sharedinbox/ on the host.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 09:20:48 +02:00
Bot of Thomas Güttler d905cd653f fix: check Docker availability before falling back to local Dagger engine (#329) (#333) 2026-05-29 23:19:14 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 e21cde0a3c fix: allow forgejo-actions as issue author in agent loop
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 21:52:56 +02:00
Bot of Thomas Güttler 50a6678ec2 feat: reimplement user preferences, archive, configurable navigation (#315) (#324) 2026-05-29 19:08:12 +02:00
Bot of Thomas Güttler adc4eb6f6d feat: remove publish-website from deploy.yml, schedule website.yml hourly (#325) (#330) 2026-05-29 12:53:18 +02:00
Bot of Thomas Güttler c45775be92 fix: move sync health report to own row below each account (#311) (#322) 2026-05-28 06:53:11 +02:00
Bot of Thomas Güttler a5928c1aa6 fix: add _tea_get and merged-PR catch-up to close issues on merge (#305) (#310) 2026-05-28 00:07:13 +02:00
Bot of Thomas Güttler 7f3cd43d6e feat: add --dangerously-skip-permissions to claude --resume output (#304) (#309) 2026-05-27 23:48:12 +02:00
Bot of Thomas Güttler 41550eb4b5 feat: configurable menu bar position for mailbox view (#298) (#303) 2026-05-27 22:07:12 +02:00
Bot of Thomas Güttler 5ddfe68467 feat: catch up Renovate PRs with passing CI in agent loop (#289) (#293) 2026-05-27 20:09:13 +02:00
Bot of Thomas Güttler c0dd13be5d feat: align single and multi-mail actions, add archive (#287) (#291) 2026-05-27 19:36:13 +02:00
Bot of Thomas Güttler 73bbfd2694 fix: add explicit note that app settings are never uploaded (#280) (#281) 2026-05-27 08:25:20 +02:00
Thomas SharedInbox 49e6b335d9 better err msg in agent-loop. 2026-05-27 08:14:42 +02:00
Bot of Thomas Güttler f57a8c502d feat: syncLog add Copy button, stack trace, isPermanent (#266) (#269) 2026-05-26 07:55:07 +02:00
Bot of Thomas Güttler 8709e9f38d feat: add Locale, Text Scale, DB Schema Version, Device Model to About page (#258) (#263) 2026-05-25 22:18:09 +02:00
Bot of Thomas Güttler bb475a2350 fix: auto-resolve merge failures instead of asking for manual merge (#253) (#256) 2026-05-25 19:38:07 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 06df3ee200 feat: monitor agent loop health every 2 hours (#217)
- Track a heartbeat timestamp in ~/.sharedinbox-agent-heartbeat at the
  start of each _run_loop() invocation so we can tell when it last ran.
- Add `agent_loop.py monitor` subcommand that exits 1 with a WARNING
  message if the heartbeat is missing, corrupted, or older than 2 hours.
- Add .forgejo/workflows/monitor.yml scheduled workflow that runs the
  monitor check every 2 hours on the self-hosted runner; a CI failure
  serves as the warning when the loop is stalled.
- Add 7 unit tests covering all monitor / heartbeat scenarios.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 12:48:45 +02:00
27bef3356e fix: skip catch-up merge retry when issue has State/Question (#239) (#242)
When a catch-up PR merge fails (PR stays open after the merge command), the loop sets the issue to State/Question and comments on it. But on the next cron tick the same PR is still open with passing CI, so it tries again — spamming the issue with identical comments every minute.

Fix: before attempting a catch-up merge, fetch the issue's current labels via `_get_issue_labels()`. If `State/Question` is already set, skip the PR entirely.

Closes #239

Co-authored-by: Thomas SharedInbox <sharedinbox@thomas-guettler.de>
Reviewed-on: https://codeberg.org/guettli/sharedinbox/pulls/242
2026-05-25 09:21:23 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 f4a052bedc feat: add State/ToPlan planning phase to agent loop
Issues labelled State/ToPlan are now picked up by a dedicated planning
agent before any implementation happens. The agent posts a plan as an
issue comment, then the loop transitions the label to State/Planned and
leaves a resume command in a follow-up comment. A human reviews the plan
and manually promotes the issue to State/Ready to trigger implementation.

Planning agents run at higher priority than Ready issues.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 18:56:46 +02:00
Thomas SharedInbox b2c11e0c63 Revert "feat: keep secrets in sync via age-encrypted master key (#208) (#223)"
This reverts commit 96b1660b59.
2026-05-24 18:39:23 +02:00
Bot of Thomas Güttler 96b1660b59 feat: keep secrets in sync via age-encrypted master key (#208) (#223) 2026-05-24 16:35:10 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 d9b8748631 fix: filter _latest_main_ci_run by workflow_id == ci.yml
Forgejo reports deploy.yml (scheduled/dispatch) runs with event=push
and prettyref=main, identical to ci.yml push runs. The event-only
filter was insufficient — adding workflow_id == "ci.yml" prevents
deploy.yml runs from blocking or triggering false CI fix agents.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 15:07:00 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 77e581299d fix: filter out schedule/deploy workflow runs in CI checks
_latest_main_ci_run() was using event != pull_request which still
matched deploy.yml schedule runs when their prettyref == "main",
blocking the loop from picking up new issues.

_latest_ci_run_for_branch() had the same issue: the else branch matched
any non-pull_request event including schedule runs.

Both functions now explicitly filter for event == "push" only.

Tests updated: rename _latest_ci_run → _latest_main_ci_run, mock
_open_issue_prs to prevent real API calls in unit tests, and update
_find_pr_for_branch side_effect to reflect the upstream post-merge
PR-still-open verification check.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 14:08:13 +02:00
Bot of Thomas Güttler 37eca207c6 fix: pin SSH host key via known_hosts instead of StrictHostKeyChecking=no (#161) (#181) 2026-05-24 13:00:04 +02:00
Bot of Thomas Güttler 5925cee4f2 fix: show git hash as clickable link above stacktrace (#201) (#211) 2026-05-24 12:56:27 +02:00
Bot of Thomas Güttler a8603edfc3 fix: verify PID belongs to claude before SIGKILL (#160) (#163) 2026-05-24 12:55:08 +02:00
Bot of Thomas Güttler 0293cb5845 fix: stop retrying on MissingPluginException from flutter_secure_storage (#200) (#209) 2026-05-24 08:50:06 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 c517f604e0 test: update deploy_playstore tests for requests-based transport
The previous tests patched google_auth_httplib2 and googleapiclient which
no longer exist in the new implementation. Rewrite to mock AuthorizedSession
and _upload_aab_resumable, covering the same scenarios: happy path, retry
on transient errors, backoff delays, and exhausted attempts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 07:40:17 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 7d393ec818 fix: switch Play Store upload from httplib2 to requests
httplib2 treats 308 Resume Incomplete responses (used by Google's
resumable upload API) as redirects and raises RedirectMissingLocation
when the response lacks a Location header. Switch to
google.auth.transport.requests.AuthorizedSession + direct HTTP calls
so the upload uses the requests library, which handles 308 correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 07:32:22 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 80cde04d87 fix: retry AAB upload on RedirectMissingLocation with exponential backoff (#186)
Wrap the resumable bundle upload in a loop of up to _MAX_UPLOAD_ATTEMPTS (3)
attempts. On httplib2.error.RedirectMissingLocation, recreate MediaFileUpload
(resumable uploads cannot reuse the same object) and wait 10 s / 20 s before
retrying. After all attempts are exhausted, raise RuntimeError chained to the
last exception. Add tests covering the retry path, backoff delays, fresh
MediaFileUpload on each attempt, and exhaustion.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 04:59:05 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 fb6f2cca68 fix: add timeout and retries to Play Store upload (#185)
Switch deploy_playstore.py from requests/AuthorizedSession to the
googleapiclient.discovery client with google-auth-httplib2, so that
AuthorizedHttp(timeout=300) enforces a hard socket timeout on all
requests and num_retries=3 on every .execute() call enables automatic
retries for transient failures.

Update flake.nix and ci/main.go to install the new dependencies
(google-api-python-client, google-auth-httplib2, httplib2) instead of
the old google-auth + requests pair.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 04:38:36 +02:00
Bot of Thomas Güttler 71ccf24d0c fix: survive permanently broken path_provider channel on Android (#192) (#194) 2026-05-24 03:50:07 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 6e22683f5b fix(crash_screen): remove duplicate gitLine definition left by rebase conflict resolution
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-23 17:02:39 +02:00
Bot of Thomas Güttler 19d8d282ba fix: show UUID in agent-loop resume command (#152) (#176) 2026-05-23 15:20:08 +02:00
Bot of Thomas Güttler aa59dbb852 feat: show CI run link in 'CI passed' message (#151) (#174) 2026-05-23 15:05:07 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 5ad6599951 fix(agent_loop): match CI run to PR branch via event_payload, not head_branch
The Forgejo workflow_runs API has no head_branch field.  For pull_request
events the branch lives in event_payload["pull_request"]["head"]["ref"];
for push events it is in prettyref.  The old code used run.get("head_branch")
which always returned None, causing _latest_ci_run_for_branch to never find
the run and the loop to declare "no CI run after 15 min" and set the issue to
State/Question — even when CI had already passed.

Also fixes a pre-existing test mock that was missing the session_name kwarg.
Adds TestLatestCiRunForBranch covering both event types and the regression.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-23 13:36:21 +02:00
Thomas SharedInbox 55c15177d8 fix(publish-website): survive SSH failure in generate_build_history (#164)
The Dagger container running generate_build_history.py may not always
reach the deployment server (network constraints on the Dagger engine).
Rather than aborting the entire publish-website pipeline, log the SSH
verbose output (already added in the previous debug commit) and return
an empty file list so Hugo still builds and rsync still deploys the
site — just without updated build-history pages.

This unblocks the cron deploy that has been failing since c259d2da.
2026-05-23 12:17:58 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 54cd6623c4 debug(ci): add ssh -v to generate_build_history for exit-255 diagnosis
Temporary: print verbose SSH output on failure to identify why the
connection fails from inside the dagger container.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-23 12:13:26 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 565b6f8e33 fix(publish-website): add -i to ssh call in generate_build_history.py
All other ssh/scp calls in the dagger module use explicit -i /root/.ssh/id_ed25519.
This one was missing it, causing exit 255 inside the dagger container.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-23 12:02:30 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 b6a2f91820 security: fix log/state file permissions, Firebase key on disk, TLS cleanup
- agent_loop.py: create log dir with mode 0700 and enforce it on
  existing dirs; open log files with mode 0600; chmod state file
  to 0600 after every write. Prevents other local processes from
  reading agent output (which may contain credential paths) or
  tampering with the state file's pid field.

- ci/main.go (TestAndroidFirebase): replace
    echo "$FIREBASE_SA_KEY" > /tmp/key.json
  with bash process substitution
    --key-file=<(echo "$FIREBASE_SA_KEY")
  The key is now passed via a file descriptor — it never touches
  disk, so it cannot be stranded by a failed gcloud auth call or
  snapshotted into the Dagger layer cache.

- ci.yml / deploy.yml: add "Cleanup TLS credentials" step
  (if: always()) at the end of every job that calls
  setup_dagger_remote.sh. Removes /tmp/dagger-tls,
  /tmp/stunnel-dagger.conf, /tmp/stunnel.pid from the self-hosted
  runner after each job, so client certs do not accumulate between
  job runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-23 10:54:53 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 1a7b585dd4 fix(agent-loop): filter issues by author; comment when setting State/Question (#158)
- Only pick up issues created by guettli, guettlibot, or guettlibot2
  to prevent the loop from acting on external/bot issues.
- Post an explanatory comment on the issue whenever the loop sets
  State/Question (agent killed, no CI run, no push detected), so the
  reason is visible without digging through cron logs. Closes #158.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-23 10:04:44 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 9cd18ba70e feat: agent loop uses PRs; ci.yml fast-only; hourly deploy workflow (#156)
- agent_loop.py: agents now create an `issue-N-fix` branch and open a PR;
  the loop discovers the PR via `fgj pr list`, tracks its CI run, squash-merges
  on green, and falls back to the global-CI path if no PR exists (backward compat).
  Adds `_find_pr_for_branch`, `_latest_ci_run_for_branch`, `_merge_pr` helpers.

- .forgejo/workflows/ci.yml: strip to the single fast `check` job only
  (removes build-linux, deploy-playstore, publish-website).

- .forgejo/workflows/deploy.yml (new, replaces android-emulator-tests.yml):
  scheduled hourly + workflow_dispatch; runs firebase tests, Play Store deploy,
  Linux build/deploy, website publish; on completion sets CI/Full-Pass or
  CI/Full-Fail label on the repo's DEPLOY_HEALTH_ISSUE tracking issue.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 22:05:09 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 b48cb98813 fix(agent-loop): detect agent crash — do not close issue when no new CI run appeared
If the agent exits immediately (e.g. rate-limit), the loop was closing the
pending issue against the *previous* CI run, which was still green.

Fix: record the latest CI run ID when an issue agent starts. If the run ID
hasn't changed when the agent exits, the agent pushed nothing → set
State/Question instead of closing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 21:52:02 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 7e3a63f507 ci: validate gcloud auth stderr, fail on 'error' in output, check test count (#145)
- Capture gcloud auth stderr separately and fail on unexpected output;
  ignore the two known informational lines ("Activated service account
  credentials for: [...]" and "Updated property [core/project].") while
  keeping a strict "fail if unknown stderr" check for anything else.
- Replace the narrow pattern grep (non-retryable error|infrastructure_failure|
  test execution failed) with a broad whole-word case-insensitive grep for
  'error', so any infrastructure or Firebase error in the output causes CI
  failure.
- Verify that the number of device result rows in the result table matches
  the expected device count (1), so a silent test-run failure cannot slip
  through.
- Add scripts/test_firebase_check.sh with 18 unit tests for the three new
  bash patterns (auth stderr filter, error-word detection, device count).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 16:31:14 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 cc51abd1fa fix: reduce CI noise from apt-get, sdkmanager, stunnel, and Gradle (#140)
- Add -qq to apt-get update/install in Dagger toolchain to suppress
  verbose package-list output (hundreds of lines on cold cache)
- Wrap sdkmanager in silent-on-success pattern — only shows output
  on failure, like the build_runner and flutter pub get steps
- Set debug = warning in stunnel config to suppress LOG5 (info/notice)
  startup lines while keeping LOG4 (warning) and above
- Add org.gradle.welcome=never to android/gradle.properties to
  suppress the "Welcome to Gradle N.NN!" banner
- Filter SKIPPED Gradle tasks, Gradle Daemon startup messages, and
  gcloud support-page promo lines in run_firebase_test.sh

Errors and warnings are preserved in all cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 15:37:12 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 78b3d40a70 fix(agent-loop): use fgj for writes; tea api silently ignores auth errors
`tea api` exits 0 even on 401 responses, so `_close_issue` and
`_set_labels` appeared to succeed but did nothing. Issues were never
actually closed, causing them to be picked up again every cron tick.

Switch all write operations (close issue, set labels) and issue-list
reads to `fgj`, which has proper authentication. Keep `tea api` only
for CI run fetches where `fgj` times out (504). Add ~/go/bin to the
cron PATH so fgj is found.

Also add an error check in `_tea_get` for API-level error responses,
and strip State/InProgress when closing an issue.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 14:22:07 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 d72df5086c feat: close issues in Python loop after CI passes, not in agent (#134)
Previously issue agents were instructed to close the issue via prompt text
immediately after pushing. If CI then failed, the issue was already closed.

Now the loop tracks a pending_issue across cron ticks:
- When an agent finishes (issue or ci-fix), the issue number is extracted
  from state before it is cleared.
- If CI is still running, a "pending-ci" state preserves the issue number.
- If CI fails, the ci-fix agent is started with the issue number in state
  so it survives the fix cycle.
- Once CI passes, _close_issue() is called from Python — never by the agent.

The agent prompt no longer instructs the agent to close the issue.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 12:02:16 +02:00