Speed up agent loop and deploy #234

New Issue

2026-05-24T19:04:59Z

guettli commented

2026-05-24 19:04:59 +00:00

(Migrated from codeberg.org)

Look at the current way the agent loop works.

Create a plan how to speed up the flow. Final goal is to run deploy.yml

Where could caching help?

Where could running concurrently help?

Where could calling cron jobs more often help?

Think about other ways, too.

Look at the current way the agent loop works. Create a plan how to speed up the flow. Final goal is to run deploy.yml Where could caching help? Where could running concurrently help? Where could calling cron jobs more often help? Think about other ways, too.

guettlibot commented

2026-05-24 19:13:59 +00:00

(Migrated from codeberg.org)

Implementation Plan: Speed up agent loop and deploy

After reading the issue and exploring the codebase (crontab, scripts/agent_loop.py, .forgejo/workflows/deploy.yml, scripts/deploy_cron.py, Taskfile.yml, ci/main.go), here is a detailed breakdown of where time is lost and how to recover it.

Current flow and where time is wasted

The end-to-end cycle for a single issue currently looks like this (excluding actual agent work):

Issue gets label → cron fires → planning agent starts: up to 5 min
Planning agent finishes → next cron tick detects it → State/Planned set: up to 5 min
Human sets State/Ready → next cron tick → impl agent starts: up to 5 min
Impl agent finishes → next cron tick detects it → CI running: up to 5 min
CI passes → next cron tick detects it → PR merged: up to 5 min
PR merged to main → hourly schedule fires deploy.yml: up to 59 min

That is up to ~84 minutes of pure polling delay, before any actual build/deploy time.

1. Increase cron frequency for `agent_loop.py` (Quick win)

File: user crontab
Change: */5 * * * * → */1 * * * *

Each state transition in the loop costs up to one cron interval of idle waiting. Steps 1–5 above each burn up to 5 minutes. With 1-minute intervals, those five steps shrink to at most 5 minutes total (instead of 25).

Risk: 5× more invocations. Each run makes 2–4 Codeberg API calls via tea and fgj. This is negligible for a single-user instance and well within Codeberg's rate limits.

Note: The docstring in agent_loop.py still says "every 10 minutes" but the crontab already runs it every 5 minutes — the docstring should be updated to match.

2. Trigger `deploy.yml` immediately after PR merge (Highest impact)

File: scripts/agent_loop.py
Change: After every successful _merge_pr() call, trigger deploy.yml immediately via the Forgejo API.

deploy.yml is currently scheduled hourly (0 * * * *), meaning up to 59 minutes pass between a PR merging to main and apps being deployed. Triggering it from the agent loop eliminates this gap entirely.

Add a helper after the merge calls (there are two: in section 2b post-agent merge and in the catch-up section):

def _trigger_deploy() -> None:
    subprocess.run(
        ["fgj", "--hostname", "codeberg.org", "actions", "workflow", "run",
         "deploy.yml", "--ref", "main", "--repo", REPO],
        capture_output=True, check=False,  # best-effort, don't fail the loop
    )
    print("Triggered deploy.yml.")

Risk: If multiple PRs merge in quick succession, deploy.yml could be triggered several times within minutes. The check-changes job already skips redundant builds when nothing relevant changed, so this is mostly harmless overhead. Add a guard if desired: check whether a deploy.yml run started within the last N minutes before triggering again.

3. Add a `push` trigger to `deploy.yml` (Complementary to #2)

File: .forgejo/workflows/deploy.yml
Change: Add a push trigger with path filters alongside the existing schedule:

on:
  schedule:
    - cron: '0 * * * *'
  push:
    branches: [main]
    paths:
      - 'android/**'
      - 'integration_test/**'
      - 'lib/**'
      - 'pubspec.yaml'
      - 'pubspec.lock'
      - 'drift_schemas/**'
      - 'scripts/deploy_playstore.py'
      - 'linux/**'
  workflow_dispatch:

This is complementary to option #2 (the loop trigger fires even for non-source changes; the push trigger fires only when relevant files change). Either approach eliminates the hourly wait; both together give belt-and-suspenders coverage.

Risk: Forgejo evaluates the paths filter the same way as check-changes — if neither Android nor Linux source files changed, deploy.yml won't run. The existing hourly schedule still covers edge cases (e.g., infra/config changes not covered by the path filter).

4. Fix the hourly change-detection window in `deploy.yml`

File: .forgejo/workflows/deploy.yml, check-changes job
Current bug: git diff --name-only HEAD~1 HEAD compares only the last commit. On the hourly schedule, if two PRs merged since the last run, the first PR's file changes are invisible. Its Android/Linux changes will be silently skipped, and no build will fire.

Fix (simplest): For scheduled runs, always build. workflow_dispatch already sets both flags to true. Extend that logic:

if [ "" = "workflow_dispatch" ] || [ "" = "schedule" ]; then
  echo "android=true" >> ""
  echo "linux=true"   >> ""
  exit 0
fi

Dagger's caching means redundant builds are cheap — if nothing changed, Dagger replays from cache. The expensive steps (Gradle compilation, Flutter build) are fully cached on the self-hosted runner's Dagger engine volumes.

Alternative fix: Increase fetch-depth and diff from the last successful deploy's SHA, stored in a file on the runner. More precise, more complex.

5. Allow concurrent planning + implementation agents (Throughput)

File: scripts/agent_loop.py
Change: Replace the single-agent state file with separate state tracking for plan agents and impl agents, allowing one of each to run simultaneously.

Currently the loop starts either a plan agent or an impl agent per tick, never both. When a planning agent is running, no implementation work happens — even if a separate issue with State/Ready is waiting.

Approach: Use two state files (~/.sharedinbox-plan-state.json and ~/.sharedinbox-impl-state.json), or store a list in the existing state file. The loop logic becomes:

If no plan agent running and a ToPlan issue exists → start plan agent
If no impl agent running and a Ready issue exists (and CI is clean) → start impl agent
Both can run simultaneously

Risk: Two agents running task check concurrently both invoke Dagger. Since Dagger supports concurrent access to its cache volumes, this is generally safe but may slow individual runs on a resource-constrained runner. A ci-fix agent should remain mutually exclusive with impl agents (to avoid main-branch conflicts). This is the most complex change here.

Recommendation: Only worth doing once there is a persistent backlog of issues; for typical use (a few issues at a time), options 1–4 will provide more practical speed-up with far less complexity.

6. Minor: decouple `publish-website` from Firebase test results

File: .forgejo/workflows/deploy.yml
Current: publish-website depends on [build-linux, deploy-playstore, deploy-apk]. If Firebase tests fail (a separate job), deploy-playstore skips/fails, which in turn blocks publish-website.

The website publish (Hugo build + rsync) is independent of whether Android tests pass. Consider gating publish-website only on the build jobs that actually produced deployable artifacts, not on test results.

Summary and recommended order

#	Change	Files	Impact	Effort	Risk
1	Cron `*/1`	crontab	Saves ~20 min per full cycle	1 line	Very low
2	Trigger deploy.yml after merge	`agent_loop.py`	Eliminates up to 59 min deploy lag	~15 lines	Low
3	Push trigger on deploy.yml	`deploy.yml`	Same as #2 via CI	~10 lines	Low
4	Fix hourly diff window	`deploy.yml`	Correctness: no missed builds	~5 lines	Low
5	Parallel plan+impl agents	`agent_loop.py`	Throughput with multiple issues	Large refactor	Medium
6	Decouple publish-website	`deploy.yml`	Website deploys more reliably	~5 lines	Low

Start with 1, 2, 4 — these are the highest-value, lowest-risk changes and can all be implemented in one small PR. Add option 3 if belt-and-suspenders deploy triggering is desired. Defer option 5 until there is a real throughput bottleneck.

## Implementation Plan: Speed up agent loop and deploy After reading the issue and exploring the codebase (crontab, `scripts/agent_loop.py`, `.forgejo/workflows/deploy.yml`, `scripts/deploy_cron.py`, `Taskfile.yml`, `ci/main.go`), here is a detailed breakdown of where time is lost and how to recover it. --- ### Current flow and where time is wasted The end-to-end cycle for a single issue currently looks like this (excluding actual agent work): 1. Issue gets label → cron fires → planning agent starts: **up to 5 min** 2. Planning agent finishes → next cron tick detects it → State/Planned set: **up to 5 min** 3. Human sets State/Ready → next cron tick → impl agent starts: **up to 5 min** 4. Impl agent finishes → next cron tick detects it → CI running: **up to 5 min** 5. CI passes → next cron tick detects it → PR merged: **up to 5 min** 6. PR merged to main → hourly schedule fires deploy.yml: **up to 59 min** That is up to **~84 minutes of pure polling delay**, before any actual build/deploy time. --- ### 1. Increase cron frequency for `agent_loop.py` (Quick win) **File:** user crontab **Change:** `*/5 * * * *` → `*/1 * * * *` Each state transition in the loop costs up to one cron interval of idle waiting. Steps 1–5 above each burn up to 5 minutes. With 1-minute intervals, those five steps shrink to at most 5 minutes total (instead of 25). **Risk:** 5× more invocations. Each run makes 2–4 Codeberg API calls via `tea` and `fgj`. This is negligible for a single-user instance and well within Codeberg's rate limits. **Note:** The docstring in `agent_loop.py` still says "every 10 minutes" but the crontab already runs it every 5 minutes — the docstring should be updated to match. --- ### 2. Trigger `deploy.yml` immediately after PR merge (Highest impact) **File:** `scripts/agent_loop.py` **Change:** After every successful `_merge_pr()` call, trigger `deploy.yml` immediately via the Forgejo API. `deploy.yml` is currently scheduled hourly (`0 * * * *`), meaning up to 59 minutes pass between a PR merging to main and apps being deployed. Triggering it from the agent loop eliminates this gap entirely. Add a helper after the merge calls (there are two: in section 2b post-agent merge and in the catch-up section): ```python def _trigger_deploy() -> None: subprocess.run( ["fgj", "--hostname", "codeberg.org", "actions", "workflow", "run", "deploy.yml", "--ref", "main", "--repo", REPO], capture_output=True, check=False, # best-effort, don't fail the loop ) print("Triggered deploy.yml.") ``` **Risk:** If multiple PRs merge in quick succession, `deploy.yml` could be triggered several times within minutes. The `check-changes` job already skips redundant builds when nothing relevant changed, so this is mostly harmless overhead. Add a guard if desired: check whether a `deploy.yml` run started within the last N minutes before triggering again. --- ### 3. Add a `push` trigger to `deploy.yml` (Complementary to #2) **File:** `.forgejo/workflows/deploy.yml` **Change:** Add a `push` trigger with path filters alongside the existing schedule: ```yaml on: schedule: - cron: '0 * * * *' push: branches: [main] paths: - 'android/**' - 'integration_test/**' - 'lib/**' - 'pubspec.yaml' - 'pubspec.lock' - 'drift_schemas/**' - 'scripts/deploy_playstore.py' - 'linux/**' workflow_dispatch: ``` This is complementary to option #2 (the loop trigger fires even for non-source changes; the push trigger fires only when relevant files change). Either approach eliminates the hourly wait; both together give belt-and-suspenders coverage. **Risk:** Forgejo evaluates the `paths` filter the same way as `check-changes` — if neither Android nor Linux source files changed, deploy.yml won't run. The existing hourly schedule still covers edge cases (e.g., infra/config changes not covered by the path filter). --- ### 4. Fix the hourly change-detection window in `deploy.yml` **File:** `.forgejo/workflows/deploy.yml`, `check-changes` job **Current bug:** `git diff --name-only HEAD~1 HEAD` compares only the last commit. On the hourly schedule, if two PRs merged since the last run, the first PR's file changes are invisible. Its Android/Linux changes will be silently skipped, and no build will fire. **Fix (simplest):** For scheduled runs, always build. `workflow_dispatch` already sets both flags to `true`. Extend that logic: ```bash if [ "" = "workflow_dispatch" ] || [ "" = "schedule" ]; then echo "android=true" >> "" echo "linux=true" >> "" exit 0 fi ``` Dagger's caching means redundant builds are cheap — if nothing changed, Dagger replays from cache. The expensive steps (Gradle compilation, Flutter build) are fully cached on the self-hosted runner's Dagger engine volumes. **Alternative fix:** Increase `fetch-depth` and diff from the last successful deploy's SHA, stored in a file on the runner. More precise, more complex. --- ### 5. Allow concurrent planning + implementation agents (Throughput) **File:** `scripts/agent_loop.py` **Change:** Replace the single-agent state file with separate state tracking for plan agents and impl agents, allowing one of each to run simultaneously. Currently the loop starts either a plan agent or an impl agent per tick, never both. When a planning agent is running, no implementation work happens — even if a separate issue with `State/Ready` is waiting. **Approach:** Use two state files (`~/.sharedinbox-plan-state.json` and `~/.sharedinbox-impl-state.json`), or store a list in the existing state file. The loop logic becomes: - If no plan agent running and a ToPlan issue exists → start plan agent - If no impl agent running and a Ready issue exists (and CI is clean) → start impl agent - Both can run simultaneously **Risk:** Two agents running `task check` concurrently both invoke Dagger. Since Dagger supports concurrent access to its cache volumes, this is generally safe but may slow individual runs on a resource-constrained runner. A ci-fix agent should remain mutually exclusive with impl agents (to avoid main-branch conflicts). This is the most complex change here. **Recommendation:** Only worth doing once there is a persistent backlog of issues; for typical use (a few issues at a time), options 1–4 will provide more practical speed-up with far less complexity. --- ### 6. Minor: decouple `publish-website` from Firebase test results **File:** `.forgejo/workflows/deploy.yml` **Current:** `publish-website` depends on `[build-linux, deploy-playstore, deploy-apk]`. If Firebase tests fail (a separate job), `deploy-playstore` skips/fails, which in turn blocks `publish-website`. The website publish (Hugo build + rsync) is independent of whether Android tests pass. Consider gating `publish-website` only on the build jobs that actually produced deployable artifacts, not on test results. --- ### Summary and recommended order | # | Change | Files | Impact | Effort | Risk | |---|--------|-------|--------|--------|------| | 1 | Cron `*/1` | crontab | Saves ~20 min per full cycle | 1 line | Very low | | 2 | Trigger deploy.yml after merge | `agent_loop.py` | Eliminates up to 59 min deploy lag | ~15 lines | Low | | 3 | Push trigger on deploy.yml | `deploy.yml` | Same as #2 via CI | ~10 lines | Low | | 4 | Fix hourly diff window | `deploy.yml` | Correctness: no missed builds | ~5 lines | Low | | 5 | Parallel plan+impl agents | `agent_loop.py` | Throughput with multiple issues | Large refactor | Medium | | 6 | Decouple publish-website | `deploy.yml` | Website deploys more reliably | ~5 lines | Low | **Start with 1, 2, 4** — these are the highest-value, lowest-risk changes and can all be implemented in one small PR. Add option 3 if belt-and-suspenders deploy triggering is desired. Defer option 5 until there is a real throughput bottleneck.

guettlibot commented

2026-05-24 19:15:02 +00:00

(Migrated from codeberg.org)

Planning complete. To resume this session:

claude --resume 8cc2138f-1fe8-46e5-8ad4-188ab397e80c

Planning complete. To resume this session: ``` claude --resume 8cc2138f-1fe8-46e5-8ad4-188ab397e80c ```

Sign in to join this conversation.

Branches Tags

main

issue-563-agentloop-validation

dummy-pr-test

issue-560-fix-firebase-run-url

issue-539-stable-imap-uid

issue-533-shared-email-list

plan-issue-555

drop-nix

plan-issue-484

plan-issue-539

plan-issue-535

plan-issue-474

plan-issue-533

fix-dagger-engineless-precommit

issue-521-fix-deploy-yml-wait-time-api

issue-502-fix-email-id-collision-mailbox

issue-492-eliminate-duplicate-build-runner

issue-494-website-change-detection

issue-491-parallelize-check

issue-478-fix-stalwart-dual-stack-bind

issue-475-allowed-addresses-glob

issue-473-search-result-reorder

issue-453-update-agentloop-defaults

issue-466-structured-search

issue-505-exclude-chaos-monkey-from-regular-ci

issue-509-fix-search-result-sorting

fix-ink-sparkle-remaining-tests

issue-506-fix-search-emails-tests

issue-504-runner-wait-time

issue-488-search-notes

issue-472-changelog-issue-links

issue-501-folder-search-local-sqlite

issue-486-fix-stale-test-shader-mismatch

fix/prevent-settled-search-rerun-473

issue-467-fix-search-stale-results

issue-446-installed-versions-in-changelog

issue-462-fix-pr

issue-448-chaos-monkey-test

issue-436-notes-on-emails

issue-429-unify-mail-display

issue-422-move-to-folder-create-new

issue-414-ensure-not-run-as-root

issue-424-unify-email-list-views

issue-419-trusted-senders-page

issue-425-fix-prs

test-foo

issue-421-bug-report

issue-383-fix-ci

issue-394-fix-deploy-flutter-version

issue-391-fix-ci-double-trigger

issue-376-combined-inbox-v2

issue-376-combined-inbox

issue-384-fix-open-prs

sops-migrate

issue-339-safe-first-on-imap-fetch

issue-340-try-catch-measure-height

issue-342-pin-intl-version

issue-341-guard-threademails-last

issue-335-agentloop-code-test

issue-329-fix

issue-315-fix

issue-320-fix

issue-325-fix

issue-312-fix

issue-311-fix

issue-305-fix

issue-304-fix

issue-299-fix

issue-300-fix

issue-298-fix

issue-296-fix

issue-294-fix

issue-289-fix

issue-288-fix

issue-287-fix

issue-286-fix

issue-277-fix

issue-282-fix

issue-280-fix

issue-272-fix

issue-268-fix

issue-267-fix

issue-266-fix

issue-258-fix

issue-260-fix

issue-257-fix

issue-253-fix

issue-216-fix

issue-251-fix

issue-249-fix

issue-question-fixes

issue-235-fix

issue-236-fix-v2

issue-237-fix

issue-236-fix

issue-228-fix

issue-217-fix

issue-214-fix

issue-213-fix

issue-208-fix

issue-205-fix

issue-204-fix

issue-203-fix

issue-202-fix

issue-129-fix

issue-161-fix

issue-160-fix

issue-201-fix

issue-210-fix

issue-198-fix

issue-200-fix

issue-144-fix

issue-199-fix

fix/playstore-upload-use-requests

issue-193-fix

issue-186-fix

issue-185-fix

issue-192-fix

issue-183-fix

issue-175-fix

issue-172-fix

issue-171-fix

issue-167-fix

issue-136-fix

issue-162-fix

issue-179-fix

issue-155-fix

issue-154-fix

issue-152-fix

issue-151-fix

issue-141-fix

issue-150-fix

issue-164-fix

migrate-to-dagger

task/d1-ci-matrix

task/a4-typeconverter-json

task/u7-onboarding-walkthrough

task/d3-sync-doc

task/a5-layer-boundary-lint

task/t5-golden-tests

task/p5-date-cache

task/s4-link-handling

task/p3-html-parse-isolate

task/u8-mark-all-read

task/u3-recent-searches

task/a3-jmap-injectable-http-client

task/r5-tls-error-handling

fix/playstore-redirect-retry

task/t3-repository-contract-tests

task/p2-email-list-pagination

task/p1-fts5-search

fix/playstore-upload-timeout

task/a1-email-detail-notifier

fix/upgrade-workmanager-0.9

fix/android-core-library-desugaring

task/p4-db-indexes

task/r3-html-error-boundary

task/d2-check-coverage

task/a2-email-tile

task/t4-migration-tests

task/t2-widget-tests

task/t1-email-repo-coverage

task/u6-connection-status

task/u4-push-notifications

task/u2-draft-sync

task/u1-list-unsubscribe

task/s2-hostname-validation

task/r6-reliability-fuzz-tests

task/r4-sync-error-banner

task/r2-force-resync

task/r1-undo-history-persistence

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: guettli/sharedinbox#234

Speed up agent loop and deploy #234

Implementation Plan: Speed up agent loop and deploy

Current flow and where time is wasted

1. Increase cron frequency for agent_loop.py (Quick win)

2. Trigger deploy.yml immediately after PR merge (Highest impact)

3. Add a push trigger to deploy.yml (Complementary to #2)

4. Fix the hourly change-detection window in deploy.yml

5. Allow concurrent planning + implementation agents (Throughput)

6. Minor: decouple publish-website from Firebase test results

Summary and recommended order

1. Increase cron frequency for `agent_loop.py` (Quick win)

2. Trigger `deploy.yml` immediately after PR merge (Highest impact)

3. Add a `push` trigger to `deploy.yml` (Complementary to #2)

4. Fix the hourly change-detection window in `deploy.yml`

6. Minor: decouple `publish-website` from Firebase test results