Split big source files #254

New Issue

2026-05-25T17:26:55Z

guettli commented

2026-05-25 17:26:55 +00:00

(Migrated from codeberg.org)

Look at big files of the git repo.

Does it make sense to split some of them?

Look at big files of the git repo. Does it make sense to split some of them?

guettlibot commented

2026-05-25 17:33:02 +00:00

(Migrated from codeberg.org)

Implementation Plan: Split Big Source Files

File Inventory

File	Lines	Status
`scripts/agent_loop.py`	991	Primary candidate
`scripts/test_agent_loop.py`	938	Keep as-is (test file)
`scripts/test_deploy_playstore.py`	200	Keep as-is (test file)
`ci/otel-receiver.py`	195	Keep as-is
`scripts/generate_build_history.py`	181	Keep as-is
`scripts/test_generate_build_history.py`	113	Keep as-is
`scripts/deploy_playstore.py`	113	Keep as-is
`deploy_cron.py`	55	Keep as-is

Conclusion up front: Only agent_loop.py (991 lines) is a real candidate for splitting. All other files are well-sized and single-purpose. test_agent_loop.py at 938 lines is large but that reflects the complexity of the code under test — test files are intentionally paired and verbose.

agent_loop.py — Internal Structure

The file currently contains four distinct concerns:

Forgejo API calls (~200 lines, lines 80–315)
Functions like _ready_issues(), _to_plan_issues(), _latest_main_ci_run(), _find_pr_for_branch(), _set_labels(), _close_issue(), _comment_issue(), etc. These are all stateless wrappers around fgj / tea CLI calls.
State file I/O (~60 lines, lines 317–377)
Functions _read_state(), _write_state(), _clear_state(), _find_session_uuid() — pure file-based state management.
Agent lifecycle (~80 lines, lines 380–465)
Functions _start_agent(), _agent_alive(), _is_claude_process(), _agent_age_seconds(), _git_summary(), _kill_agent().
Main orchestration loop (~530 lines, lines 481–991)
cmd_list(), cmd_monitor(), _run_loop(), and main() — the core state machine and CLI entrypoint.

Proposed Split

Option A: Extract two helper modules (recommended)

scripts/forgejo_client.py (~200 lines)
Move all _fgj(), _tea_get(), and issue/PR/CI query+update functions here. This is the most natural boundary — these functions have no dependency on the state file or agent processes and could be reused or tested independently.

scripts/agent_state.py (~60 lines)
Move _read_state(), _write_state(), _clear_state(), _find_session_uuid() here. This isolates the state schema (a single JSON file) so future changes to the state format are localised.

scripts/agent_loop.py (~730 lines remaining)
Keeps agent lifecycle functions and the full _run_loop() orchestration, which is inherently complex and benefits from being readable in one place.

Option B: Extract three helper modules

Same as Option A, but also extract agent lifecycle into scripts/agent_process.py (~80 lines: _start_agent(), _agent_alive(), _kill_agent(), etc.), leaving agent_loop.py at ~650 lines.

Option C: No split (status quo)

The current 991-line file is navigable because it's linearly structured with clear section comments. The orchestration in _run_loop() references all three other concern groups, so any split introduces import dependencies. If the file rarely changes, the maintenance cost of splitting may exceed the benefit.

Impact on Tests

test_agent_loop.py currently imports from agent_loop directly and mocks many of the extracted functions. A split would require updating import paths in the test file (e.g., mocking forgejo_client._fgj instead of agent_loop._fgj). This is mechanical but ~50–80 mock patches would need updating.

Risks and Open Questions

Is 991 lines actually causing pain? The issue asks whether splitting makes sense — the question to answer first is whether anyone has found agent_loop.py hard to navigate or modify. If not, splitting is churn.
Option A vs B trade-off: Option A gives the most value per line of refactor effort. Option B adds another module boundary that may not be worth the extra indirection for 80 lines.
Circular imports: _run_loop() currently calls both API functions and state functions. After splitting, agent_loop would import from both forgejo_client and agent_state — straightforward, no risk of circularity.
CI/test breakage: All existing tests would need import-path updates for any mocked symbol that moves to a new module. Low risk but requires careful find-and-replace.
otel-receiver.py: Contains a hand-written protobuf decoder (~60 lines). It could be extracted to a helper if protobuf decoding were ever reused elsewhere, but right now it's standalone — no split recommended.

Recommendation

Proceed with Option A if the goal is cleaner structure with meaningful boundaries. Skip the refactor if the file is not actively causing friction. The forgejo_client.py extraction is the highest-value change: it isolates all external CLI calls, makes mocking cleaner, and reduces agent_loop.py by ~20%.

## Implementation Plan: Split Big Source Files ### File Inventory | File | Lines | Status | |------|-------|--------| | `scripts/agent_loop.py` | 991 | **Primary candidate** | | `scripts/test_agent_loop.py` | 938 | Keep as-is (test file) | | `scripts/test_deploy_playstore.py` | 200 | Keep as-is (test file) | | `ci/otel-receiver.py` | 195 | Keep as-is | | `scripts/generate_build_history.py` | 181 | Keep as-is | | `scripts/test_generate_build_history.py` | 113 | Keep as-is | | `scripts/deploy_playstore.py` | 113 | Keep as-is | | `deploy_cron.py` | 55 | Keep as-is | **Conclusion up front:** Only `agent_loop.py` (991 lines) is a real candidate for splitting. All other files are well-sized and single-purpose. `test_agent_loop.py` at 938 lines is large but that reflects the complexity of the code under test — test files are intentionally paired and verbose. --- ### agent_loop.py — Internal Structure The file currently contains four distinct concerns: 1. **Forgejo API calls** (~200 lines, lines 80–315) Functions like `_ready_issues()`, `_to_plan_issues()`, `_latest_main_ci_run()`, `_find_pr_for_branch()`, `_set_labels()`, `_close_issue()`, `_comment_issue()`, etc. These are all stateless wrappers around `fgj` / `tea` CLI calls. 2. **State file I/O** (~60 lines, lines 317–377) Functions `_read_state()`, `_write_state()`, `_clear_state()`, `_find_session_uuid()` — pure file-based state management. 3. **Agent lifecycle** (~80 lines, lines 380–465) Functions `_start_agent()`, `_agent_alive()`, `_is_claude_process()`, `_agent_age_seconds()`, `_git_summary()`, `_kill_agent()`. 4. **Main orchestration loop** (~530 lines, lines 481–991) `cmd_list()`, `cmd_monitor()`, `_run_loop()`, and `main()` — the core state machine and CLI entrypoint. --- ### Proposed Split #### Option A: Extract two helper modules (recommended) **`scripts/forgejo_client.py`** (~200 lines) Move all `_fgj()`, `_tea_get()`, and issue/PR/CI query+update functions here. This is the most natural boundary — these functions have no dependency on the state file or agent processes and could be reused or tested independently. **`scripts/agent_state.py`** (~60 lines) Move `_read_state()`, `_write_state()`, `_clear_state()`, `_find_session_uuid()` here. This isolates the state schema (a single JSON file) so future changes to the state format are localised. **`scripts/agent_loop.py`** (~730 lines remaining) Keeps agent lifecycle functions and the full `_run_loop()` orchestration, which is inherently complex and benefits from being readable in one place. #### Option B: Extract three helper modules Same as Option A, but also extract agent lifecycle into `scripts/agent_process.py` (~80 lines: `_start_agent()`, `_agent_alive()`, `_kill_agent()`, etc.), leaving `agent_loop.py` at ~650 lines. #### Option C: No split (status quo) The current 991-line file is navigable because it's linearly structured with clear section comments. The orchestration in `_run_loop()` references all three other concern groups, so any split introduces import dependencies. If the file rarely changes, the maintenance cost of splitting may exceed the benefit. --- ### Impact on Tests `test_agent_loop.py` currently imports from `agent_loop` directly and mocks many of the extracted functions. A split would require updating import paths in the test file (e.g., mocking `forgejo_client._fgj` instead of `agent_loop._fgj`). This is mechanical but ~50–80 mock patches would need updating. --- ### Risks and Open Questions 1. **Is 991 lines actually causing pain?** The issue asks whether splitting *makes sense* — the question to answer first is whether anyone has found `agent_loop.py` hard to navigate or modify. If not, splitting is churn. 2. **Option A vs B trade-off:** Option A gives the most value per line of refactor effort. Option B adds another module boundary that may not be worth the extra indirection for 80 lines. 3. **Circular imports:** `_run_loop()` currently calls both API functions and state functions. After splitting, `agent_loop` would import from both `forgejo_client` and `agent_state` — straightforward, no risk of circularity. 4. **CI/test breakage:** All existing tests would need import-path updates for any mocked symbol that moves to a new module. Low risk but requires careful find-and-replace. 5. **`otel-receiver.py`:** Contains a hand-written protobuf decoder (~60 lines). It could be extracted to a helper if protobuf decoding were ever reused elsewhere, but right now it's standalone — no split recommended. --- ### Recommendation **Proceed with Option A** if the goal is cleaner structure with meaningful boundaries. Skip the refactor if the file is not actively causing friction. The `forgejo_client.py` extraction is the highest-value change: it isolates all external CLI calls, makes mocking cleaner, and reduces `agent_loop.py` by ~20%.

guettlibot commented

2026-05-25 17:33:05 +00:00

(Migrated from codeberg.org)

Implementation Plan: Split Big Source Files

Summary of Current File Sizes

File	Lines	Role
`scripts/agent_loop.py`	991	Core CI/CD orchestration
`scripts/test_agent_loop.py`	938	Unit tests for above
`scripts/test_deploy_playstore.py`	200	Play Store deployment tests
`ci/otel-receiver.py`	195	OpenTelemetry trace receiver
`scripts/generate_build_history.py`	181	Build artifact documentation
`scripts/deploy_playstore.py`	113	Play Store upload
`scripts/test_generate_build_history.py`	113	Build history tests
`scripts/deploy_cron.py`	55	Website deployment trigger

Recommendation: Only Split `agent_loop.py`

The other files are all under ~200 lines and have a single, cohesive responsibility each — they do not need splitting. agent_loop.py at 991 lines is the only file where splitting would genuinely improve maintainability.

test_agent_loop.py (938 lines) should be split in tandem with agent_loop.py, so each test module mirrors its source module.

Proposed Module Split for `agent_loop.py`

The file has five clearly separable concerns:

1. `agent_loop_state.py` (~80 lines)

State file I/O and heartbeat tracking.

_read_state(), _write_state(), _clear_state()
_update_heartbeat()
_find_session_uuid()

2. `agent_loop_agents.py` (~120 lines)

Agent lifecycle management.

_start_agent(), _agent_alive(), _agent_age_seconds(), _kill_agent()
_is_claude_process()

3. `agent_loop_codeberg.py` (~120 lines)

Codeberg/Forgejo API wrappers.

_fgj(), _tea_get()
_set_labels(), _close_issue(), _comment_issue()
_ready_issues(), _to_plan_issues(), _open_issue_prs()
_issue_url(), _ci_run_url()

4. `agent_loop_ci.py` (~130 lines)

CI run querying and PR merge operations.

_latest_main_ci_run(), _latest_ci_run_for_branch(), _latest_ci_run_for_pr()
_find_pr_for_branch(), _merge_pr()
_handle_pr_still_open_after_merge()

5. `agent_loop.py` (~450 lines, the main orchestrator)

Keeps only the top-level orchestration logic.

Configuration constants (REPO, LABEL_*, etc.)
_run_loop() — the main state machine
cmd_list(), cmd_monitor(), main()
Imports from the four modules above

6. Test files split in parallel

test_agent_loop_state.py — tests for TestStateFile, TestHeartbeat, TestFindSessionUuid
test_agent_loop_agents.py — tests for TestAgentAlive, TestIsClaudeProcess, TestKillAgent, TestStartAgent
test_agent_loop_codeberg.py — tests for TestUrlHelpers, TestOutputFormat
test_agent_loop_ci.py — tests for TestLatestMainCiRun, TestLatestCiRunForBranch, TestMergeFailsOpen
test_agent_loop.py (reduced) — keeps TestMain, TestPendingCi, TestRunLoopResumeCommand, TestCatchupSkipsQuestionIssues

Approach

Create new modules by moving functions — no logic changes, no renames.
Replace each moved function in agent_loop.py with an import from its new home.
Split the test file in the same commit so tests continue to pass after each move.
Run python -m pytest scripts/ after each module extraction to verify nothing broke.

Risks and Open Questions

Circular imports: The main orchestration in agent_loop.py calls helpers in all four sub-modules. As long as the sub-modules do not import from agent_loop.py (they should not need to), there is no risk of circular imports.
Constants shared across modules: REPO, LABEL_*, and ALLOWED_ISSUE_AUTHORS are used in both agent_loop_codeberg.py and agent_loop.py. Options: (a) keep them in agent_loop.py and pass as arguments, or (b) move to a agent_loop_config.py constants file. Option (b) is cleaner.
Test isolation: Current tests mock subprocess and os at the agent_loop module level. After splitting, each test module must mock at the correct module path (e.g., agent_loop_agents.subprocess instead of agent_loop.subprocess). This requires careful attention during the split.
_run_loop() length: Even after extraction, _run_loop() itself is ~400 lines. It could be further broken into named sub-functions within agent_loop.py (e.g., _handle_pending_issue(), _handle_catchup(), _handle_no_agent()) without moving them to separate files. This would be a good follow-up.

Files to Change

scripts/agent_loop.py — split into 5 files
scripts/test_agent_loop.py — split into 5 test files (in tandem)
ci/otel-receiver.py, scripts/deploy_playstore.py, scripts/generate_build_history.py, scripts/deploy_cron.py — no change needed, already small and cohesive

## Implementation Plan: Split Big Source Files ### Summary of Current File Sizes | File | Lines | Role | |------|-------|------| | `scripts/agent_loop.py` | 991 | Core CI/CD orchestration | | `scripts/test_agent_loop.py` | 938 | Unit tests for above | | `scripts/test_deploy_playstore.py` | 200 | Play Store deployment tests | | `ci/otel-receiver.py` | 195 | OpenTelemetry trace receiver | | `scripts/generate_build_history.py` | 181 | Build artifact documentation | | `scripts/deploy_playstore.py` | 113 | Play Store upload | | `scripts/test_generate_build_history.py` | 113 | Build history tests | | `scripts/deploy_cron.py` | 55 | Website deployment trigger | --- ### Recommendation: Only Split `agent_loop.py` The other files are all under ~200 lines and have a single, cohesive responsibility each — they do not need splitting. `agent_loop.py` at 991 lines is the only file where splitting would genuinely improve maintainability. `test_agent_loop.py` (938 lines) should be split in tandem with `agent_loop.py`, so each test module mirrors its source module. --- ### Proposed Module Split for `agent_loop.py` The file has five clearly separable concerns: #### 1. `agent_loop_state.py` (~80 lines) State file I/O and heartbeat tracking. - `_read_state()`, `_write_state()`, `_clear_state()` - `_update_heartbeat()` - `_find_session_uuid()` #### 2. `agent_loop_agents.py` (~120 lines) Agent lifecycle management. - `_start_agent()`, `_agent_alive()`, `_agent_age_seconds()`, `_kill_agent()` - `_is_claude_process()` #### 3. `agent_loop_codeberg.py` (~120 lines) Codeberg/Forgejo API wrappers. - `_fgj()`, `_tea_get()` - `_set_labels()`, `_close_issue()`, `_comment_issue()` - `_ready_issues()`, `_to_plan_issues()`, `_open_issue_prs()` - `_issue_url()`, `_ci_run_url()` #### 4. `agent_loop_ci.py` (~130 lines) CI run querying and PR merge operations. - `_latest_main_ci_run()`, `_latest_ci_run_for_branch()`, `_latest_ci_run_for_pr()` - `_find_pr_for_branch()`, `_merge_pr()` - `_handle_pr_still_open_after_merge()` #### 5. `agent_loop.py` (~450 lines, the main orchestrator) Keeps only the top-level orchestration logic. - Configuration constants (`REPO`, `LABEL_*`, etc.) - `_run_loop()` — the main state machine - `cmd_list()`, `cmd_monitor()`, `main()` - Imports from the four modules above #### 6. Test files split in parallel - `test_agent_loop_state.py` — tests for `TestStateFile`, `TestHeartbeat`, `TestFindSessionUuid` - `test_agent_loop_agents.py` — tests for `TestAgentAlive`, `TestIsClaudeProcess`, `TestKillAgent`, `TestStartAgent` - `test_agent_loop_codeberg.py` — tests for `TestUrlHelpers`, `TestOutputFormat` - `test_agent_loop_ci.py` — tests for `TestLatestMainCiRun`, `TestLatestCiRunForBranch`, `TestMergeFailsOpen` - `test_agent_loop.py` (reduced) — keeps `TestMain`, `TestPendingCi`, `TestRunLoopResumeCommand`, `TestCatchupSkipsQuestionIssues` --- ### Approach 1. Create new modules by moving functions — no logic changes, no renames. 2. Replace each moved function in `agent_loop.py` with an import from its new home. 3. Split the test file in the same commit so tests continue to pass after each move. 4. Run `python -m pytest scripts/` after each module extraction to verify nothing broke. --- ### Risks and Open Questions - **Circular imports:** The main orchestration in `agent_loop.py` calls helpers in all four sub-modules. As long as the sub-modules do not import from `agent_loop.py` (they should not need to), there is no risk of circular imports. - **Constants shared across modules:** `REPO`, `LABEL_*`, and `ALLOWED_ISSUE_AUTHORS` are used in both `agent_loop_codeberg.py` and `agent_loop.py`. Options: (a) keep them in `agent_loop.py` and pass as arguments, or (b) move to a `agent_loop_config.py` constants file. Option (b) is cleaner. - **Test isolation:** Current tests mock `subprocess` and `os` at the `agent_loop` module level. After splitting, each test module must mock at the correct module path (e.g., `agent_loop_agents.subprocess` instead of `agent_loop.subprocess`). This requires careful attention during the split. - **`_run_loop()` length:** Even after extraction, `_run_loop()` itself is ~400 lines. It could be further broken into named sub-functions within `agent_loop.py` (e.g., `_handle_pending_issue()`, `_handle_catchup()`, `_handle_no_agent()`) without moving them to separate files. This would be a good follow-up. --- ### Files to Change - `scripts/agent_loop.py` — split into 5 files - `scripts/test_agent_loop.py` — split into 5 test files (in tandem) - `ci/otel-receiver.py`, `scripts/deploy_playstore.py`, `scripts/generate_build_history.py`, `scripts/deploy_cron.py` — **no change needed**, already small and cohesive

Sign in to join this conversation.

Branches Tags

main

issue-563-agentloop-validation

dummy-pr-test

issue-560-fix-firebase-run-url

issue-539-stable-imap-uid

issue-533-shared-email-list

plan-issue-555

drop-nix

plan-issue-484

plan-issue-539

plan-issue-535

plan-issue-474

plan-issue-533

fix-dagger-engineless-precommit

issue-521-fix-deploy-yml-wait-time-api

issue-502-fix-email-id-collision-mailbox

issue-492-eliminate-duplicate-build-runner

issue-494-website-change-detection

issue-491-parallelize-check

issue-478-fix-stalwart-dual-stack-bind

issue-475-allowed-addresses-glob

issue-473-search-result-reorder

issue-453-update-agentloop-defaults

issue-466-structured-search

issue-505-exclude-chaos-monkey-from-regular-ci

issue-509-fix-search-result-sorting

fix-ink-sparkle-remaining-tests

issue-506-fix-search-emails-tests

issue-504-runner-wait-time

issue-488-search-notes

issue-472-changelog-issue-links

issue-501-folder-search-local-sqlite

issue-486-fix-stale-test-shader-mismatch

fix/prevent-settled-search-rerun-473

issue-467-fix-search-stale-results

issue-446-installed-versions-in-changelog

issue-462-fix-pr

issue-448-chaos-monkey-test

issue-436-notes-on-emails

issue-429-unify-mail-display

issue-422-move-to-folder-create-new

issue-414-ensure-not-run-as-root

issue-424-unify-email-list-views

issue-419-trusted-senders-page

issue-425-fix-prs

test-foo

issue-421-bug-report

issue-383-fix-ci

issue-394-fix-deploy-flutter-version

issue-391-fix-ci-double-trigger

issue-376-combined-inbox-v2

issue-376-combined-inbox

issue-384-fix-open-prs

sops-migrate

issue-339-safe-first-on-imap-fetch

issue-340-try-catch-measure-height

issue-342-pin-intl-version

issue-341-guard-threademails-last

issue-335-agentloop-code-test

issue-329-fix

issue-315-fix

issue-320-fix

issue-325-fix

issue-312-fix

issue-311-fix

issue-305-fix

issue-304-fix

issue-299-fix

issue-300-fix

issue-298-fix

issue-296-fix

issue-294-fix

issue-289-fix

issue-288-fix

issue-287-fix

issue-286-fix

issue-277-fix

issue-282-fix

issue-280-fix

issue-272-fix

issue-268-fix

issue-267-fix

issue-266-fix

issue-258-fix

issue-260-fix

issue-257-fix

issue-253-fix

issue-216-fix

issue-251-fix

issue-249-fix

issue-question-fixes

issue-235-fix

issue-236-fix-v2

issue-237-fix

issue-236-fix

issue-228-fix

issue-217-fix

issue-214-fix

issue-213-fix

issue-208-fix

issue-205-fix

issue-204-fix

issue-203-fix

issue-202-fix

issue-129-fix

issue-161-fix

issue-160-fix

issue-201-fix

issue-210-fix

issue-198-fix

issue-200-fix

issue-144-fix

issue-199-fix

fix/playstore-upload-use-requests

issue-193-fix

issue-186-fix

issue-185-fix

issue-192-fix

issue-183-fix

issue-175-fix

issue-172-fix

issue-171-fix

issue-167-fix

issue-136-fix

issue-162-fix

issue-179-fix

issue-155-fix

issue-154-fix

issue-152-fix

issue-151-fix

issue-141-fix

issue-150-fix

issue-164-fix

migrate-to-dagger

task/d1-ci-matrix

task/a4-typeconverter-json

task/u7-onboarding-walkthrough

task/d3-sync-doc

task/a5-layer-boundary-lint

task/t5-golden-tests

task/p5-date-cache

task/s4-link-handling

task/p3-html-parse-isolate

task/u8-mark-all-read

task/u3-recent-searches

task/a3-jmap-injectable-http-client

task/r5-tls-error-handling

fix/playstore-redirect-retry

task/t3-repository-contract-tests

task/p2-email-list-pagination

task/p1-fts5-search

fix/playstore-upload-timeout

task/a1-email-detail-notifier

fix/upgrade-workmanager-0.9

fix/android-core-library-desugaring

task/p4-db-indexes

task/r3-html-error-boundary

task/d2-check-coverage

task/a2-email-tile

task/t4-migration-tests

task/t2-widget-tests

task/t1-email-repo-coverage

task/u6-connection-status

task/u4-push-notifications

task/u2-draft-sync

task/u1-list-unsubscribe

task/s2-hostname-validation

task/r6-reliability-fuzz-tests

task/r4-sync-error-banner

task/r2-force-resync

task/r1-undo-history-persistence

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: guettli/sharedinbox#254

Split big source files #254

Implementation Plan: Split Big Source Files

File Inventory

agent_loop.py — Internal Structure

Proposed Split

Option A: Extract two helper modules (recommended)

Option B: Extract three helper modules

Option C: No split (status quo)

Impact on Tests

Risks and Open Questions

Recommendation

Implementation Plan: Split Big Source Files

Summary of Current File Sizes

Recommendation: Only Split agent_loop.py

Proposed Module Split for agent_loop.py

1. agent_loop_state.py (~80 lines)

2. agent_loop_agents.py (~120 lines)

3. agent_loop_codeberg.py (~120 lines)

4. agent_loop_ci.py (~130 lines)

5. agent_loop.py (~450 lines, the main orchestrator)

6. Test files split in parallel

Approach

Risks and Open Questions

Files to Change

Recommendation: Only Split `agent_loop.py`

Proposed Module Split for `agent_loop.py`

1. `agent_loop_state.py` (~80 lines)

2. `agent_loop_agents.py` (~120 lines)

3. `agent_loop_codeberg.py` (~120 lines)

4. `agent_loop_ci.py` (~130 lines)

5. `agent_loop.py` (~450 lines, the main orchestrator)