Agent loop, add label Question, avoid endless loop #239
Closed
opened 2026-05-25 05:52:22 +00:00 by guettli
·
3 comments
No Branch/Tag Specified
main
issue-563-agentloop-validation
dummy-pr-test
issue-560-fix-firebase-run-url
issue-539-stable-imap-uid
issue-533-shared-email-list
plan-issue-555
drop-nix
plan-issue-484
plan-issue-539
plan-issue-535
plan-issue-474
plan-issue-533
fix-dagger-engineless-precommit
issue-521-fix-deploy-yml-wait-time-api
issue-502-fix-email-id-collision-mailbox
issue-492-eliminate-duplicate-build-runner
issue-494-website-change-detection
issue-491-parallelize-check
issue-478-fix-stalwart-dual-stack-bind
issue-475-allowed-addresses-glob
issue-473-search-result-reorder
issue-453-update-agentloop-defaults
issue-466-structured-search
issue-505-exclude-chaos-monkey-from-regular-ci
issue-509-fix-search-result-sorting
fix-ink-sparkle-remaining-tests
issue-506-fix-search-emails-tests
issue-504-runner-wait-time
issue-488-search-notes
issue-472-changelog-issue-links
issue-501-folder-search-local-sqlite
issue-486-fix-stale-test-shader-mismatch
fix/prevent-settled-search-rerun-473
issue-467-fix-search-stale-results
issue-446-installed-versions-in-changelog
issue-462-fix-pr
issue-448-chaos-monkey-test
issue-436-notes-on-emails
issue-429-unify-mail-display
issue-422-move-to-folder-create-new
issue-414-ensure-not-run-as-root
issue-424-unify-email-list-views
issue-419-trusted-senders-page
issue-425-fix-prs
test-foo
issue-421-bug-report
issue-383-fix-ci
issue-394-fix-deploy-flutter-version
issue-391-fix-ci-double-trigger
issue-376-combined-inbox-v2
issue-376-combined-inbox
issue-384-fix-open-prs
sops-migrate
issue-339-safe-first-on-imap-fetch
issue-340-try-catch-measure-height
issue-342-pin-intl-version
issue-341-guard-threademails-last
issue-335-agentloop-code-test
issue-329-fix
issue-315-fix
issue-320-fix
issue-325-fix
issue-312-fix
issue-311-fix
issue-305-fix
issue-304-fix
issue-299-fix
issue-300-fix
issue-298-fix
issue-296-fix
issue-294-fix
issue-289-fix
issue-288-fix
issue-287-fix
issue-286-fix
issue-277-fix
issue-282-fix
issue-280-fix
issue-272-fix
issue-268-fix
issue-267-fix
issue-266-fix
issue-258-fix
issue-260-fix
issue-257-fix
issue-253-fix
issue-216-fix
issue-251-fix
issue-249-fix
issue-question-fixes
issue-235-fix
issue-236-fix-v2
issue-237-fix
issue-236-fix
issue-228-fix
issue-217-fix
issue-214-fix
issue-213-fix
issue-208-fix
issue-205-fix
issue-204-fix
issue-203-fix
issue-202-fix
issue-129-fix
issue-161-fix
issue-160-fix
issue-201-fix
issue-210-fix
issue-198-fix
issue-200-fix
issue-144-fix
issue-199-fix
fix/playstore-upload-use-requests
issue-193-fix
issue-186-fix
issue-185-fix
issue-192-fix
issue-183-fix
issue-175-fix
issue-172-fix
issue-171-fix
issue-167-fix
issue-136-fix
issue-162-fix
issue-179-fix
issue-155-fix
issue-154-fix
issue-152-fix
issue-151-fix
issue-141-fix
issue-150-fix
issue-164-fix
migrate-to-dagger
task/d1-ci-matrix
task/a4-typeconverter-json
task/u7-onboarding-walkthrough
task/d3-sync-doc
task/a5-layer-boundary-lint
task/t5-golden-tests
task/p5-date-cache
task/s4-link-handling
task/p3-html-parse-isolate
task/u8-mark-all-read
task/u3-recent-searches
task/a3-jmap-injectable-http-client
task/r5-tls-error-handling
fix/playstore-redirect-retry
task/t3-repository-contract-tests
task/p2-email-list-pagination
task/p1-fts5-search
fix/playstore-upload-timeout
task/a1-email-detail-notifier
fix/upgrade-workmanager-0.9
fix/android-core-library-desugaring
task/p4-db-indexes
task/r3-html-error-boundary
task/d2-check-coverage
task/a2-email-tile
task/t4-migration-tests
task/t2-widget-tests
task/t1-email-repo-coverage
task/u6-connection-status
task/u4-push-notifications
task/u2-draft-sync
task/u1-list-unsubscribe
task/s2-hostname-validation
task/r6-reliability-fuzz-tests
task/r4-sync-error-banner
task/r2-force-resync
task/r1-undo-history-persistence
No results found.
Labels
Clear labels
NeedSupervisor
State/InProgress
State/Later
State/Planned
automerge
ci-failure
do-not-merge
loop/code
loop/code-ci-pending
loop/code-done
loop/code-in-process
loop/merge
loop/merge-done
loop/merge-in-process
loop/plan
loop/plan-done
loop/plan-in-process
Issue escalated to a human supervisor; agentloop will skip it until cleared.
Eligible for automatic merge by CI
Issue opened by agentloop to track a failing CI workflow; used for deduplication.
Plan PR — review only, do not merge.
Add to run the built-in "code" prompt; override at prompts/code.md.
Prompt "code" finished; waiting for the PR's CI to pass before advancing.
Prompt "code" finished successfully.
Agent for the "code" prompt is currently running on this issue.
Managed by agentloop
Managed by agentloop
Managed by agentloop
Add to run the built-in "plan" prompt; override at prompts/plan.md.
Prompt "plan" finished successfully.
Agent for the "plan" prompt is currently running on this issue.
No labels
State/Planned
Milestone
No items
No Milestone
Projects
Clear projects
No projects
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: guettli/sharedinbox#239
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
There was an endless loop.
See
https://codeberg.org/guettli/sharedinbox/issues/210
Break such loops by setting label Question.
Update agent loop.
Create a plan.
Implementation Plan
Root Cause Analysis (from issue #210)
Issue #210 experienced an endless loop: PR #215 was stuck in a state where
_merge_pr()returned without error but the PR remained open. The catch-up section in_run_loop()kept finding the same open PR on every cron tick, attempting the merge, finding it still open, and posting another "Automatic merge of PR #215 failed" comment — generating 148 identical comments over ~17 hours.The current code already has a State/Question check to break such loops (line 695–697 of
scripts/agent_loop.py), but there are three gaps that allow the protection to be bypassed:Gap 1 —
_get_issue_labels()can raise and crash the loop.The check at line 695 calls
_get_issue_labels(issue_num), which calls_tea_get(), which raisesRuntimeErroron any API error or non-zero exit. An unhandled exception propagates up through_run_loop(), the cron job exits with an error, and on the next tick the same PR is found again — State/Question is never checked.Gap 2 — State/Question is NOT set when
_merge_pr()raises RuntimeError.Lines 699–702 catch the exception and
continue, but they do not set State/Question. On every subsequent cron tick, the PR is found open, CI is success, LABEL_QUESTION is not present → merge is retried indefinitely.Gap 3 — State/Question check is too narrow.
The LABEL_QUESTION check is nested inside
if pr_run and pr_run.get("status") == "success". If for any reason the CI run is absent or its status changes (e.g. re-run), the check is skipped entirely.Files to Change
Only one file:
scripts/agent_loop.pySupporting changes:
scripts/test_agent_loop.pyDetailed Changes
Change 1 — Harden State/Question check; move it before CI status checks (catch-up loop, ~line 683)
Currently:
Proposed — move the State/Question guard to the top of the loop body (before the CI-status checks), and add try/except:
This ensures the guard fires regardless of CI status, and that an API error while checking labels causes a safe skip rather than a propagating exception.
Change 2 — Set State/Question when
_merge_pr()raises in catch-up loop (~line 699)Currently:
Proposed:
Change 3 — Wrap the State/Question-setting code in the existing "still open after merge" path in try/except (~line 706)
Currently, if
_set_labels()or_comment_issue()raises in the "PR still open after merge" path, the exception propagates and the label is never reliably set. The fix is to wrap:This ensures the outer
continueis always reached even if the Codeberg API calls fail, preventing the exception from crashing the loop.Change 4 — Add comment deduplication helper
To prevent a secondary flood in edge cases where State/Question is set but keeps getting bypassed, add a helper that checks whether the most recent comment on an issue already contains a given message prefix, and skips posting if so:
Actually, checking existing comments adds API calls and complexity. A simpler alternative is to rely on the State/Question mechanism being reliable after Changes 1–3, and add a comment-rate-limit field to the state file. This change is lower priority; if Changes 1–3 are implemented, the flooding scenario should not recur. Defer comment deduplication unless testing reveals it is still needed.
Test Changes (
scripts/test_agent_loop.py)Add a new test class
TestCatchUpLoop(or extend the existing suite) covering:_get_issue_labelsraises → catch-up loop skips the PR gracefully, does not crash, does not attempt merge._merge_prraises RuntimeError → catch-up loop sets State/Question label on the issue and posts a comment.The existing test patterns in
test_agent_loop.pyuseunittest.mock.patchonagent_loop._fgj,agent_loop._tea_get, etc. — follow those conventions.Risks and Open Questions
Does State/Question conflict with other labels? The label-set call uses
remove=[LABEL_IN_PROGRESS]but does not removeLABEL_READYorLABEL_TO_PLAN. If an issue somehow has both State/Ready and State/Question, the next_ready_issues()call would still pick it up. Consider whether the set should also explicitly remove LABEL_READY and LABEL_TO_PLAN.The
_latest_ci_run_for_prcall is inside the loop and makes an API request per PR. If there are many open issue-fix PRs, this generates many API calls. Not a blocking issue now, but worth noting.Idempotency of
_set_labels: If State/Question is already set and we call_set_labels(add=[LABEL_QUESTION], ...)again, does Codeberg/fgj error? It should be a no-op, but worth verifying.The issue title says "Update agent loop" broadly. The above plan addresses the specific endless-loop bug from issue #210. If there are additional loop improvements intended (e.g. better CI-fix retry limiting, main-branch loop guards), those should be tracked as separate issues or clarified here.
Planning complete. To resume this session:
Implementation Plan (updated)
What was already done
Commit
696dcdcadded_get_issue_labels()and placed aState/Questionguard inside the catch-up loop'sstatus == "success"branch. This stops the most common loop: repeated merge retries when CI is passing.Remaining gaps
The previous plan comment identified three gaps. Commit
696dcdcaddressed gap 3 partially, but gaps 1 and 2 are still open:Gap 1 —
_get_issue_labels()can raise and bypass the guard._get_issue_labels()calls_tea_get(), which raisesRuntimeErroron any API error. If the call fails, the exception propagates out of the catch-up loop uncaught, the guard is never evaluated, and the merge attempt happens anyway. After a failed merge the loop exits with an exception rather than settingState/Question.Gap 2 —
_merge_pr()raisingRuntimeErrorin catch-up does not setState/Question.Lines ~699–702 catch the exception and
continuewithout setting the label. On the next tick the PR is found again, CI is still passing,State/Questionis absent → merge is retried indefinitely.Gap 3 (partial) — guard is inside
status == "success"only.If
pr_runisNoneor has an unexpected status, the guard is skipped entirely. Moving the guard to the top of thefor pr in open_prs:loop body (before the CI-status checks) would make the protection unconditional.Changes required
File:
scripts/agent_loop.pyChange A — Move the
State/Questionguard to the top of the catch-upforloop and wrap it intry/except:Remove the duplicate guard that is currently inside the
status == "success"block.Change B — Set
State/Questionwhen_merge_pr()raises in the catch-up loop:Change C — Wrap the existing "PR still open after merge" State/Question-setting block in
try/exceptso that a Codeberg API failure during the label update does not preventcontinuefrom being reached:File:
scripts/test_agent_loop.pyAdd tests in a new class (e.g.
TestCatchUpLoopGuards):_get_issue_labelsraises → catch-up skips the PR, no merge attempted, no crash._merge_prraisesRuntimeError→ catch-up setsState/Questionon the issue and posts a comment.State/Questionand CI is not "success" (e.g.pr_run=None) → guard at top of loop still fires, merge skipped.Follow the existing
unittest.mock.patchpatterns in the test file.Risks and open questions
Idempotency of
_set_labels: Callingadd=[LABEL_QUESTION]when the label is already present should be a no-op on Codeberg. The fgj tool exits 0 in this case. No special handling needed.Should
LABEL_READYandLABEL_TO_PLANalso be removed when settingState/Question? The current code only removesLABEL_IN_PROGRESS. If an issue somehow has bothState/ReadyandState/Question, it would still appear in_ready_issues(). Low-risk in practice but worth removing them defensively.Section 2b (regular pending-issue flow) has no
State/Questionguard before merge. If the catch-up writes apending-cistate for an issue that already hasState/Question(e.g. after a new CI run starts on the PR branch), section 2b will attempt the merge without checking. This is a lower-priority gap because catch-up'spending-ciwrite happens only when CI is still running — after Change A is in place, the catch-up will skip the PR entirely whenState/Questionis set, so thepending-ciwrite won't occur. Mentioning it for completeness.