Split big source files #254

Open
opened 2026-05-25 17:26:55 +00:00 by guettli · 2 comments
guettli commented 2026-05-25 17:26:55 +00:00 (Migrated from codeberg.org)

Look at big files of the git repo.

Does it make sense to split some of them?

Look at big files of the git repo. Does it make sense to split some of them?
guettlibot commented 2026-05-25 17:33:02 +00:00 (Migrated from codeberg.org)

Implementation Plan: Split Big Source Files

File Inventory

File Lines Status
scripts/agent_loop.py 991 Primary candidate
scripts/test_agent_loop.py 938 Keep as-is (test file)
scripts/test_deploy_playstore.py 200 Keep as-is (test file)
ci/otel-receiver.py 195 Keep as-is
scripts/generate_build_history.py 181 Keep as-is
scripts/test_generate_build_history.py 113 Keep as-is
scripts/deploy_playstore.py 113 Keep as-is
deploy_cron.py 55 Keep as-is

Conclusion up front: Only agent_loop.py (991 lines) is a real candidate for splitting. All other files are well-sized and single-purpose. test_agent_loop.py at 938 lines is large but that reflects the complexity of the code under test — test files are intentionally paired and verbose.


agent_loop.py — Internal Structure

The file currently contains four distinct concerns:

  1. Forgejo API calls (~200 lines, lines 80–315)
    Functions like _ready_issues(), _to_plan_issues(), _latest_main_ci_run(), _find_pr_for_branch(), _set_labels(), _close_issue(), _comment_issue(), etc. These are all stateless wrappers around fgj / tea CLI calls.

  2. State file I/O (~60 lines, lines 317–377)
    Functions _read_state(), _write_state(), _clear_state(), _find_session_uuid() — pure file-based state management.

  3. Agent lifecycle (~80 lines, lines 380–465)
    Functions _start_agent(), _agent_alive(), _is_claude_process(), _agent_age_seconds(), _git_summary(), _kill_agent().

  4. Main orchestration loop (~530 lines, lines 481–991)
    cmd_list(), cmd_monitor(), _run_loop(), and main() — the core state machine and CLI entrypoint.


Proposed Split

Option A: Extract two helper modules (recommended)

scripts/forgejo_client.py (~200 lines)
Move all _fgj(), _tea_get(), and issue/PR/CI query+update functions here. This is the most natural boundary — these functions have no dependency on the state file or agent processes and could be reused or tested independently.

scripts/agent_state.py (~60 lines)
Move _read_state(), _write_state(), _clear_state(), _find_session_uuid() here. This isolates the state schema (a single JSON file) so future changes to the state format are localised.

scripts/agent_loop.py (~730 lines remaining)
Keeps agent lifecycle functions and the full _run_loop() orchestration, which is inherently complex and benefits from being readable in one place.

Option B: Extract three helper modules

Same as Option A, but also extract agent lifecycle into scripts/agent_process.py (~80 lines: _start_agent(), _agent_alive(), _kill_agent(), etc.), leaving agent_loop.py at ~650 lines.

Option C: No split (status quo)

The current 991-line file is navigable because it's linearly structured with clear section comments. The orchestration in _run_loop() references all three other concern groups, so any split introduces import dependencies. If the file rarely changes, the maintenance cost of splitting may exceed the benefit.


Impact on Tests

test_agent_loop.py currently imports from agent_loop directly and mocks many of the extracted functions. A split would require updating import paths in the test file (e.g., mocking forgejo_client._fgj instead of agent_loop._fgj). This is mechanical but ~50–80 mock patches would need updating.


Risks and Open Questions

  1. Is 991 lines actually causing pain? The issue asks whether splitting makes sense — the question to answer first is whether anyone has found agent_loop.py hard to navigate or modify. If not, splitting is churn.

  2. Option A vs B trade-off: Option A gives the most value per line of refactor effort. Option B adds another module boundary that may not be worth the extra indirection for 80 lines.

  3. Circular imports: _run_loop() currently calls both API functions and state functions. After splitting, agent_loop would import from both forgejo_client and agent_state — straightforward, no risk of circularity.

  4. CI/test breakage: All existing tests would need import-path updates for any mocked symbol that moves to a new module. Low risk but requires careful find-and-replace.

  5. otel-receiver.py: Contains a hand-written protobuf decoder (~60 lines). It could be extracted to a helper if protobuf decoding were ever reused elsewhere, but right now it's standalone — no split recommended.


Recommendation

Proceed with Option A if the goal is cleaner structure with meaningful boundaries. Skip the refactor if the file is not actively causing friction. The forgejo_client.py extraction is the highest-value change: it isolates all external CLI calls, makes mocking cleaner, and reduces agent_loop.py by ~20%.

## Implementation Plan: Split Big Source Files ### File Inventory | File | Lines | Status | |------|-------|--------| | `scripts/agent_loop.py` | 991 | **Primary candidate** | | `scripts/test_agent_loop.py` | 938 | Keep as-is (test file) | | `scripts/test_deploy_playstore.py` | 200 | Keep as-is (test file) | | `ci/otel-receiver.py` | 195 | Keep as-is | | `scripts/generate_build_history.py` | 181 | Keep as-is | | `scripts/test_generate_build_history.py` | 113 | Keep as-is | | `scripts/deploy_playstore.py` | 113 | Keep as-is | | `deploy_cron.py` | 55 | Keep as-is | **Conclusion up front:** Only `agent_loop.py` (991 lines) is a real candidate for splitting. All other files are well-sized and single-purpose. `test_agent_loop.py` at 938 lines is large but that reflects the complexity of the code under test — test files are intentionally paired and verbose. --- ### agent_loop.py — Internal Structure The file currently contains four distinct concerns: 1. **Forgejo API calls** (~200 lines, lines 80–315) Functions like `_ready_issues()`, `_to_plan_issues()`, `_latest_main_ci_run()`, `_find_pr_for_branch()`, `_set_labels()`, `_close_issue()`, `_comment_issue()`, etc. These are all stateless wrappers around `fgj` / `tea` CLI calls. 2. **State file I/O** (~60 lines, lines 317–377) Functions `_read_state()`, `_write_state()`, `_clear_state()`, `_find_session_uuid()` — pure file-based state management. 3. **Agent lifecycle** (~80 lines, lines 380–465) Functions `_start_agent()`, `_agent_alive()`, `_is_claude_process()`, `_agent_age_seconds()`, `_git_summary()`, `_kill_agent()`. 4. **Main orchestration loop** (~530 lines, lines 481–991) `cmd_list()`, `cmd_monitor()`, `_run_loop()`, and `main()` — the core state machine and CLI entrypoint. --- ### Proposed Split #### Option A: Extract two helper modules (recommended) **`scripts/forgejo_client.py`** (~200 lines) Move all `_fgj()`, `_tea_get()`, and issue/PR/CI query+update functions here. This is the most natural boundary — these functions have no dependency on the state file or agent processes and could be reused or tested independently. **`scripts/agent_state.py`** (~60 lines) Move `_read_state()`, `_write_state()`, `_clear_state()`, `_find_session_uuid()` here. This isolates the state schema (a single JSON file) so future changes to the state format are localised. **`scripts/agent_loop.py`** (~730 lines remaining) Keeps agent lifecycle functions and the full `_run_loop()` orchestration, which is inherently complex and benefits from being readable in one place. #### Option B: Extract three helper modules Same as Option A, but also extract agent lifecycle into `scripts/agent_process.py` (~80 lines: `_start_agent()`, `_agent_alive()`, `_kill_agent()`, etc.), leaving `agent_loop.py` at ~650 lines. #### Option C: No split (status quo) The current 991-line file is navigable because it's linearly structured with clear section comments. The orchestration in `_run_loop()` references all three other concern groups, so any split introduces import dependencies. If the file rarely changes, the maintenance cost of splitting may exceed the benefit. --- ### Impact on Tests `test_agent_loop.py` currently imports from `agent_loop` directly and mocks many of the extracted functions. A split would require updating import paths in the test file (e.g., mocking `forgejo_client._fgj` instead of `agent_loop._fgj`). This is mechanical but ~50–80 mock patches would need updating. --- ### Risks and Open Questions 1. **Is 991 lines actually causing pain?** The issue asks whether splitting *makes sense* — the question to answer first is whether anyone has found `agent_loop.py` hard to navigate or modify. If not, splitting is churn. 2. **Option A vs B trade-off:** Option A gives the most value per line of refactor effort. Option B adds another module boundary that may not be worth the extra indirection for 80 lines. 3. **Circular imports:** `_run_loop()` currently calls both API functions and state functions. After splitting, `agent_loop` would import from both `forgejo_client` and `agent_state` — straightforward, no risk of circularity. 4. **CI/test breakage:** All existing tests would need import-path updates for any mocked symbol that moves to a new module. Low risk but requires careful find-and-replace. 5. **`otel-receiver.py`:** Contains a hand-written protobuf decoder (~60 lines). It could be extracted to a helper if protobuf decoding were ever reused elsewhere, but right now it's standalone — no split recommended. --- ### Recommendation **Proceed with Option A** if the goal is cleaner structure with meaningful boundaries. Skip the refactor if the file is not actively causing friction. The `forgejo_client.py` extraction is the highest-value change: it isolates all external CLI calls, makes mocking cleaner, and reduces `agent_loop.py` by ~20%.
guettlibot commented 2026-05-25 17:33:05 +00:00 (Migrated from codeberg.org)

Implementation Plan: Split Big Source Files

Summary of Current File Sizes

File Lines Role
scripts/agent_loop.py 991 Core CI/CD orchestration
scripts/test_agent_loop.py 938 Unit tests for above
scripts/test_deploy_playstore.py 200 Play Store deployment tests
ci/otel-receiver.py 195 OpenTelemetry trace receiver
scripts/generate_build_history.py 181 Build artifact documentation
scripts/deploy_playstore.py 113 Play Store upload
scripts/test_generate_build_history.py 113 Build history tests
scripts/deploy_cron.py 55 Website deployment trigger

Recommendation: Only Split agent_loop.py

The other files are all under ~200 lines and have a single, cohesive responsibility each — they do not need splitting. agent_loop.py at 991 lines is the only file where splitting would genuinely improve maintainability.

test_agent_loop.py (938 lines) should be split in tandem with agent_loop.py, so each test module mirrors its source module.


Proposed Module Split for agent_loop.py

The file has five clearly separable concerns:

1. agent_loop_state.py (~80 lines)

State file I/O and heartbeat tracking.

  • _read_state(), _write_state(), _clear_state()
  • _update_heartbeat()
  • _find_session_uuid()

2. agent_loop_agents.py (~120 lines)

Agent lifecycle management.

  • _start_agent(), _agent_alive(), _agent_age_seconds(), _kill_agent()
  • _is_claude_process()

3. agent_loop_codeberg.py (~120 lines)

Codeberg/Forgejo API wrappers.

  • _fgj(), _tea_get()
  • _set_labels(), _close_issue(), _comment_issue()
  • _ready_issues(), _to_plan_issues(), _open_issue_prs()
  • _issue_url(), _ci_run_url()

4. agent_loop_ci.py (~130 lines)

CI run querying and PR merge operations.

  • _latest_main_ci_run(), _latest_ci_run_for_branch(), _latest_ci_run_for_pr()
  • _find_pr_for_branch(), _merge_pr()
  • _handle_pr_still_open_after_merge()

5. agent_loop.py (~450 lines, the main orchestrator)

Keeps only the top-level orchestration logic.

  • Configuration constants (REPO, LABEL_*, etc.)
  • _run_loop() — the main state machine
  • cmd_list(), cmd_monitor(), main()
  • Imports from the four modules above

6. Test files split in parallel

  • test_agent_loop_state.py — tests for TestStateFile, TestHeartbeat, TestFindSessionUuid
  • test_agent_loop_agents.py — tests for TestAgentAlive, TestIsClaudeProcess, TestKillAgent, TestStartAgent
  • test_agent_loop_codeberg.py — tests for TestUrlHelpers, TestOutputFormat
  • test_agent_loop_ci.py — tests for TestLatestMainCiRun, TestLatestCiRunForBranch, TestMergeFailsOpen
  • test_agent_loop.py (reduced) — keeps TestMain, TestPendingCi, TestRunLoopResumeCommand, TestCatchupSkipsQuestionIssues

Approach

  1. Create new modules by moving functions — no logic changes, no renames.
  2. Replace each moved function in agent_loop.py with an import from its new home.
  3. Split the test file in the same commit so tests continue to pass after each move.
  4. Run python -m pytest scripts/ after each module extraction to verify nothing broke.

Risks and Open Questions

  • Circular imports: The main orchestration in agent_loop.py calls helpers in all four sub-modules. As long as the sub-modules do not import from agent_loop.py (they should not need to), there is no risk of circular imports.
  • Constants shared across modules: REPO, LABEL_*, and ALLOWED_ISSUE_AUTHORS are used in both agent_loop_codeberg.py and agent_loop.py. Options: (a) keep them in agent_loop.py and pass as arguments, or (b) move to a agent_loop_config.py constants file. Option (b) is cleaner.
  • Test isolation: Current tests mock subprocess and os at the agent_loop module level. After splitting, each test module must mock at the correct module path (e.g., agent_loop_agents.subprocess instead of agent_loop.subprocess). This requires careful attention during the split.
  • _run_loop() length: Even after extraction, _run_loop() itself is ~400 lines. It could be further broken into named sub-functions within agent_loop.py (e.g., _handle_pending_issue(), _handle_catchup(), _handle_no_agent()) without moving them to separate files. This would be a good follow-up.

Files to Change

  • scripts/agent_loop.py — split into 5 files
  • scripts/test_agent_loop.py — split into 5 test files (in tandem)
  • ci/otel-receiver.py, scripts/deploy_playstore.py, scripts/generate_build_history.py, scripts/deploy_cron.pyno change needed, already small and cohesive
## Implementation Plan: Split Big Source Files ### Summary of Current File Sizes | File | Lines | Role | |------|-------|------| | `scripts/agent_loop.py` | 991 | Core CI/CD orchestration | | `scripts/test_agent_loop.py` | 938 | Unit tests for above | | `scripts/test_deploy_playstore.py` | 200 | Play Store deployment tests | | `ci/otel-receiver.py` | 195 | OpenTelemetry trace receiver | | `scripts/generate_build_history.py` | 181 | Build artifact documentation | | `scripts/deploy_playstore.py` | 113 | Play Store upload | | `scripts/test_generate_build_history.py` | 113 | Build history tests | | `scripts/deploy_cron.py` | 55 | Website deployment trigger | --- ### Recommendation: Only Split `agent_loop.py` The other files are all under ~200 lines and have a single, cohesive responsibility each — they do not need splitting. `agent_loop.py` at 991 lines is the only file where splitting would genuinely improve maintainability. `test_agent_loop.py` (938 lines) should be split in tandem with `agent_loop.py`, so each test module mirrors its source module. --- ### Proposed Module Split for `agent_loop.py` The file has five clearly separable concerns: #### 1. `agent_loop_state.py` (~80 lines) State file I/O and heartbeat tracking. - `_read_state()`, `_write_state()`, `_clear_state()` - `_update_heartbeat()` - `_find_session_uuid()` #### 2. `agent_loop_agents.py` (~120 lines) Agent lifecycle management. - `_start_agent()`, `_agent_alive()`, `_agent_age_seconds()`, `_kill_agent()` - `_is_claude_process()` #### 3. `agent_loop_codeberg.py` (~120 lines) Codeberg/Forgejo API wrappers. - `_fgj()`, `_tea_get()` - `_set_labels()`, `_close_issue()`, `_comment_issue()` - `_ready_issues()`, `_to_plan_issues()`, `_open_issue_prs()` - `_issue_url()`, `_ci_run_url()` #### 4. `agent_loop_ci.py` (~130 lines) CI run querying and PR merge operations. - `_latest_main_ci_run()`, `_latest_ci_run_for_branch()`, `_latest_ci_run_for_pr()` - `_find_pr_for_branch()`, `_merge_pr()` - `_handle_pr_still_open_after_merge()` #### 5. `agent_loop.py` (~450 lines, the main orchestrator) Keeps only the top-level orchestration logic. - Configuration constants (`REPO`, `LABEL_*`, etc.) - `_run_loop()` — the main state machine - `cmd_list()`, `cmd_monitor()`, `main()` - Imports from the four modules above #### 6. Test files split in parallel - `test_agent_loop_state.py` — tests for `TestStateFile`, `TestHeartbeat`, `TestFindSessionUuid` - `test_agent_loop_agents.py` — tests for `TestAgentAlive`, `TestIsClaudeProcess`, `TestKillAgent`, `TestStartAgent` - `test_agent_loop_codeberg.py` — tests for `TestUrlHelpers`, `TestOutputFormat` - `test_agent_loop_ci.py` — tests for `TestLatestMainCiRun`, `TestLatestCiRunForBranch`, `TestMergeFailsOpen` - `test_agent_loop.py` (reduced) — keeps `TestMain`, `TestPendingCi`, `TestRunLoopResumeCommand`, `TestCatchupSkipsQuestionIssues` --- ### Approach 1. Create new modules by moving functions — no logic changes, no renames. 2. Replace each moved function in `agent_loop.py` with an import from its new home. 3. Split the test file in the same commit so tests continue to pass after each move. 4. Run `python -m pytest scripts/` after each module extraction to verify nothing broke. --- ### Risks and Open Questions - **Circular imports:** The main orchestration in `agent_loop.py` calls helpers in all four sub-modules. As long as the sub-modules do not import from `agent_loop.py` (they should not need to), there is no risk of circular imports. - **Constants shared across modules:** `REPO`, `LABEL_*`, and `ALLOWED_ISSUE_AUTHORS` are used in both `agent_loop_codeberg.py` and `agent_loop.py`. Options: (a) keep them in `agent_loop.py` and pass as arguments, or (b) move to a `agent_loop_config.py` constants file. Option (b) is cleaner. - **Test isolation:** Current tests mock `subprocess` and `os` at the `agent_loop` module level. After splitting, each test module must mock at the correct module path (e.g., `agent_loop_agents.subprocess` instead of `agent_loop.subprocess`). This requires careful attention during the split. - **`_run_loop()` length:** Even after extraction, `_run_loop()` itself is ~400 lines. It could be further broken into named sub-functions within `agent_loop.py` (e.g., `_handle_pending_issue()`, `_handle_catchup()`, `_handle_no_agent()`) without moving them to separate files. This would be a good follow-up. --- ### Files to Change - `scripts/agent_loop.py` — split into 5 files - `scripts/test_agent_loop.py` — split into 5 test files (in tandem) - `ci/otel-receiver.py`, `scripts/deploy_playstore.py`, `scripts/generate_build_history.py`, `scripts/deploy_cron.py` — **no change needed**, already small and cohesive
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: guettli/sharedinbox#254