Why does agent loop create merge conflicts? #313

Closed
opened 2026-05-28 02:50:14 +00:00 by guettli · 2 comments
guettli commented 2026-05-28 02:50:14 +00:00 (Migrated from codeberg.org)

My intention was that the agent loop does planning concurrently (issue with label ToPlan), but issues which need to code changes are done one after the other.

This does not seem to be the case.

See

https://codeberg.org/guettli/sharedinbox/pulls/307

Research why merge conflict happens in above case

Create plan how to fix agent loop, so merge conflicts are avoided.

My intention was that the agent loop does planning concurrently (issue with label ToPlan), but issues which need to code changes are done one after the other. This does not seem to be the case. See https://codeberg.org/guettli/sharedinbox/pulls/307 Research why merge conflict happens in above case Create plan how to fix agent loop, so merge conflicts are avoided.
guettlibot commented 2026-05-28 03:49:39 +00:00 (Migrated from codeberg.org)

Implementation Plan: Preventing Merge Conflicts in the Agent Loop

Root Cause Analysis

PR #307 shows merge conflicts in 10 files (db_schema_version.dart, user_preferences.dart, etc.). After reading scripts/agent_loop.py in detail, there are two independent root causes:


Root Cause 1: Multiple concurrent open issue-N-fix PRs

The loop can end up with two or more open issue branches simultaneously. Here is one concrete path:

  1. Agent for issue N runs for >1 hour and is killed (the MAX_AGENT_AGE_SECONDS guard).
  2. The loop sets issue N to State/Question and clears the state file — but the PR for issue-N-fix stays open.
  3. On the next tick the loop has no pending state. The catch-up section (2c in the file) sees issue N's PR is State/Question and skips it. The loop then falls through to the "Find a Ready issue" block and starts an agent for issue N+1.
  4. Issue N+1's agent creates issue-(N+1)-fix branching from the same version of main as issue N's branch.
  5. Now two branches exist that both modify the same files (e.g., both increment db_schema_version from 34 → 35).
  6. When the first branch is merged, main advances. The second branch is now stale and conflicts.

The same thing happens after a server restart/state-file loss: the catch-up loop scans open PRs and tries to merge them in order, but all of them were created from the same base commit on main.


Root Cause 2: No freshness check before merging

When CI passes on an issue PR, the loop immediately calls _merge_pr() without checking whether the branch is still up-to-date with main. In the meantime, main may have advanced (via Renovate PRs, direct CI-fix pushes, or other merged issues). For files where both main and the branch made an identical change (incrementing a version counter from the same base value), git cannot auto-resolve the conflict.

The existing _handle_pr_still_open_after_merge helper already spawns a rebase agent when it detects mergeable == False after a failed merge — but this is reactive, not proactive. A cleaner fix is to check mergeability before attempting the merge.


What Does NOT Need Changing

The planning agent path (State/ToPlan → plan agent) does not create branches or modify code, so it cannot cause merge conflicts by itself. Making planning concurrent is a separate enhancement (not a bug fix) and should be a separate issue.


Files to Change

Only scripts/agent_loop.py needs to change. No other files are involved.


Proposed Changes

Change 1 — Block new implementation agents while an active issue PR is open

Insert a guard at the bottom of _run_loop(), just before the "Find a Ready issue" block (currently around line 1065):

# Safety: do not start a new implementation agent while there is an open
# issue-N-fix PR that is not in State/Question. Two concurrent issue branches
# branched from the same main commit will cause merge conflicts when the first
# one lands.
open_prs = _open_issue_prs()
active_issue_prs = []
for pr in open_prs:
    head = pr.get("head", {})
    ref = head.get("ref") or head.get("label", "").split(":")[-1]
    m = re.match(r"^issue-(\d+)-fix$", ref or "")
    if not m:
        continue
    issue_num = int(m.group(1))
    if LABEL_QUESTION not in _get_issue_labels(issue_num):
        active_issue_prs.append(pr)

if active_issue_prs:
    pr_refs = ", ".join(f"#{p['number']}" for p in active_issue_prs)
    print(f"Open active issue PRs ({pr_refs}) — waiting before starting new implementation agent.")
    return 0

This ensures that at most one implementation branch is alive at a time (unless it is explicitly stuck in State/Question, in which case we let the next issue proceed so the loop doesn't stall forever).


Change 2 — Check PR mergeability before attempting to merge

Add a helper:

def _pr_mergeable(pr_number: int) -> bool | None:
    """Return True/False/None (unknown) for PR mergeability via the Forgejo API."""
    try:
        data = _tea_get(f"/repos/{REPO}/pulls/{pr_number}")
        return data.get("mergeable")   # True, False, or None (Forgejo still computing)
    except RuntimeError:
        return None

Then, in both the "CI passed — merge" path (section 2b, ~line 788) and the catch-up loop (section 2c, ~line 862), add before _merge_pr():

mergeable = _pr_mergeable(pr_number)
if mergeable is False:
    print(f"PR #{pr_number} is not mergeable (branch behind or conflicting) — spawning rebase agent.")
    prompt = (
        f"Rebase branch `{branch}` onto main to resolve merge conflicts, then push. "
        "Do not change any logic — only resolve conflicts and push."
    )
    session_name = f"rebase-pr-{pr_number}"
    pid = _start_agent(prompt, session_name)
    _write_state(pid, issue_num, "pending-ci", session_name=session_name)
    return 0
if mergeable is None:
    print(f"PR #{pr_number} mergeability unknown — waiting one tick.")
    _write_state(None, issue_num, "pending-ci")
    return 0

This reuses the existing rebase-agent pattern from _handle_pr_still_open_after_merge but triggers it before the merge attempt rather than after.


Change 3 — Close or comment on orphaned PRs when an agent is killed

When the timeout guard kills an agent (section 1, ~line 663), the PR it created is left open but the issue is set to State/Question. This PR becomes an orphan. Add after the kill-and-clear block:

# If a PR was already opened for this issue, add a warning comment so
# a human knows the PR is incomplete and may need to be closed or rebased.
branch = f"issue-{issue}-fix"
orphan_pr = _find_pr_for_branch(branch)
if orphan_pr:
    _comment_issue(
        issue,
        f"Agent was killed after {age/60:.0f} min. "
        f"PR #{orphan_pr['number']} () may be incomplete. "
        "Please review and close the PR if it should not be merged, "
        "or manually rebase and re-open for review.",
    )

This does not automatically close the PR (a human might want to salvage partial work), but it makes the situation visible and prevents the human from being confused about why the PR is sitting open.


Risks and Open Questions

  1. Change 1 can stall if a PR is permanently stuck (e.g., CI is always failing on the branch and no human intervenes). Mitigation: The guard only applies to non-State/Question PRs. If a CI-failing PR is manually set to State/Question by a human, the loop will skip it and continue. Document this as an escape hatch.

  2. Forgejo's mergeable field can be null while it is still computing (this is the standard Forgejo/GitHub behavior). Change 2 handles this by waiting one tick and retrying. However, if Forgejo never computes it (a rare API bug), the loop would spin forever. Mitigation: count the number of consecutive null responses and fall through to the optimistic merge after N retries (e.g., 3 ticks = 30 minutes).

  3. Rebase agent may not have the right git identity set up in ~/sharedinbox. The existing _start_agent prompt should include instructions to configure git identity before rebasing if needed.

  4. The concurrent-planning enhancement the issue author mentions ("planning concurrently") is NOT addressed here. The current loop runs one agent at a time (including planners). Making planners truly concurrent (e.g., while an implementation CI is pending) requires tracking multiple concurrent agent PIDs, which is a larger architectural change. Recommend opening a separate issue for that.


Implementation Order

  1. Implement Change 1 first — it is the primary safeguard and the simplest change.
  2. Implement Change 2 — it is a proactive improvement on the already-existing rebase-agent pattern.
  3. Implement Change 3 — it is a quality-of-life improvement for debugging orphaned PRs.

Each change can be reviewed and merged independently.

## Implementation Plan: Preventing Merge Conflicts in the Agent Loop ### Root Cause Analysis PR #307 shows merge conflicts in 10 files (db_schema_version.dart, user_preferences.dart, etc.). After reading `scripts/agent_loop.py` in detail, there are **two independent root causes**: --- #### Root Cause 1: Multiple concurrent open `issue-N-fix` PRs The loop can end up with two or more open issue branches simultaneously. Here is one concrete path: 1. Agent for issue N runs for >1 hour and is killed (the `MAX_AGENT_AGE_SECONDS` guard). 2. The loop sets issue N to `State/Question` and **clears the state file** — but the PR for `issue-N-fix` stays open. 3. On the next tick the loop has no pending state. The catch-up section (2c in the file) sees issue N's PR is `State/Question` and **skips it**. The loop then falls through to the "Find a Ready issue" block and **starts an agent for issue N+1**. 4. Issue N+1's agent creates `issue-(N+1)-fix` branching from the **same version of main** as issue N's branch. 5. Now two branches exist that both modify the same files (e.g., both increment `db_schema_version` from 34 → 35). 6. When the first branch is merged, main advances. The second branch is now stale and conflicts. The same thing happens after a server restart/state-file loss: the catch-up loop scans open PRs and tries to merge them in order, but all of them were created from the same base commit on main. --- #### Root Cause 2: No freshness check before merging When CI passes on an issue PR, the loop immediately calls `_merge_pr()` without checking whether the branch is still up-to-date with main. In the meantime, main may have advanced (via Renovate PRs, direct CI-fix pushes, or other merged issues). For files where both main and the branch made an identical change (incrementing a version counter from the same base value), git cannot auto-resolve the conflict. The existing `_handle_pr_still_open_after_merge` helper already spawns a rebase agent when it detects `mergeable == False` after a failed merge — but this is **reactive**, not proactive. A cleaner fix is to check mergeability *before* attempting the merge. --- ### What Does NOT Need Changing The **planning agent** path (State/ToPlan → `plan` agent) does not create branches or modify code, so it cannot cause merge conflicts by itself. Making planning concurrent is a separate enhancement (not a bug fix) and should be a separate issue. --- ### Files to Change **Only `scripts/agent_loop.py`** needs to change. No other files are involved. --- ### Proposed Changes #### Change 1 — Block new implementation agents while an active issue PR is open Insert a guard at the bottom of `_run_loop()`, **just before** the "Find a Ready issue" block (currently around line 1065): ```python # Safety: do not start a new implementation agent while there is an open # issue-N-fix PR that is not in State/Question. Two concurrent issue branches # branched from the same main commit will cause merge conflicts when the first # one lands. open_prs = _open_issue_prs() active_issue_prs = [] for pr in open_prs: head = pr.get("head", {}) ref = head.get("ref") or head.get("label", "").split(":")[-1] m = re.match(r"^issue-(\d+)-fix$", ref or "") if not m: continue issue_num = int(m.group(1)) if LABEL_QUESTION not in _get_issue_labels(issue_num): active_issue_prs.append(pr) if active_issue_prs: pr_refs = ", ".join(f"#{p['number']}" for p in active_issue_prs) print(f"Open active issue PRs ({pr_refs}) — waiting before starting new implementation agent.") return 0 ``` This ensures that at most one implementation branch is alive at a time (unless it is explicitly stuck in `State/Question`, in which case we let the next issue proceed so the loop doesn't stall forever). --- #### Change 2 — Check PR mergeability before attempting to merge Add a helper: ```python def _pr_mergeable(pr_number: int) -> bool | None: """Return True/False/None (unknown) for PR mergeability via the Forgejo API.""" try: data = _tea_get(f"/repos/{REPO}/pulls/{pr_number}") return data.get("mergeable") # True, False, or None (Forgejo still computing) except RuntimeError: return None ``` Then, in **both** the "CI passed — merge" path (section 2b, ~line 788) and the catch-up loop (section 2c, ~line 862), add before `_merge_pr()`: ```python mergeable = _pr_mergeable(pr_number) if mergeable is False: print(f"PR #{pr_number} is not mergeable (branch behind or conflicting) — spawning rebase agent.") prompt = ( f"Rebase branch `{branch}` onto main to resolve merge conflicts, then push. " "Do not change any logic — only resolve conflicts and push." ) session_name = f"rebase-pr-{pr_number}" pid = _start_agent(prompt, session_name) _write_state(pid, issue_num, "pending-ci", session_name=session_name) return 0 if mergeable is None: print(f"PR #{pr_number} mergeability unknown — waiting one tick.") _write_state(None, issue_num, "pending-ci") return 0 ``` This reuses the existing rebase-agent pattern from `_handle_pr_still_open_after_merge` but triggers it **before** the merge attempt rather than after. --- #### Change 3 — Close or comment on orphaned PRs when an agent is killed When the timeout guard kills an agent (section 1, ~line 663), the PR it created is left open but the issue is set to `State/Question`. This PR becomes an orphan. Add after the kill-and-clear block: ```python # If a PR was already opened for this issue, add a warning comment so # a human knows the PR is incomplete and may need to be closed or rebased. branch = f"issue-{issue}-fix" orphan_pr = _find_pr_for_branch(branch) if orphan_pr: _comment_issue( issue, f"Agent was killed after {age/60:.0f} min. " f"PR #{orphan_pr['number']} () may be incomplete. " "Please review and close the PR if it should not be merged, " "or manually rebase and re-open for review.", ) ``` This does not automatically close the PR (a human might want to salvage partial work), but it makes the situation visible and prevents the human from being confused about why the PR is sitting open. --- ### Risks and Open Questions 1. **Change 1 can stall if a PR is permanently stuck** (e.g., CI is always failing on the branch and no human intervenes). Mitigation: The guard only applies to non-`State/Question` PRs. If a CI-failing PR is manually set to `State/Question` by a human, the loop will skip it and continue. Document this as an escape hatch. 2. **Forgejo's `mergeable` field can be `null` while it is still computing** (this is the standard Forgejo/GitHub behavior). Change 2 handles this by waiting one tick and retrying. However, if Forgejo never computes it (a rare API bug), the loop would spin forever. Mitigation: count the number of consecutive `null` responses and fall through to the optimistic merge after N retries (e.g., 3 ticks = 30 minutes). 3. **Rebase agent may not have the right git identity** set up in `~/sharedinbox`. The existing `_start_agent` prompt should include instructions to configure git identity before rebasing if needed. 4. **The concurrent-planning enhancement** the issue author mentions ("planning concurrently") is NOT addressed here. The current loop runs one agent at a time (including planners). Making planners truly concurrent (e.g., while an implementation CI is pending) requires tracking multiple concurrent agent PIDs, which is a larger architectural change. Recommend opening a separate issue for that. --- ### Implementation Order 1. Implement Change 1 first — it is the primary safeguard and the simplest change. 2. Implement Change 2 — it is a proactive improvement on the already-existing rebase-agent pattern. 3. Implement Change 3 — it is a quality-of-life improvement for debugging orphaned PRs. Each change can be reviewed and merged independently.
guettlibot commented 2026-05-28 03:50:02 +00:00 (Migrated from codeberg.org)

Planning complete. To resume this session:

claude --resume 72c379f0-56a4-4386-a875-0677cc3664e8 --dangerously-skip-permissions
Planning complete. To resume this session: ``` claude --resume 72c379f0-56a4-4386-a875-0677cc3664e8 --dangerously-skip-permissions ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: guettli/sharedinbox#313