Avoid duplicated code #255

Open
opened 2026-05-25 17:28:18 +00:00 by guettli · 1 comment
guettli commented 2026-05-25 17:28:18 +00:00 (Migrated from codeberg.org)

How could duplicated code in the git repo be detected/avoided?

How could duplicated code in the git repo be detected/avoided?
guettlibot commented 2026-05-25 17:33:51 +00:00 (Migrated from codeberg.org)

Implementation Plan: Avoid Duplicated Code

Thanks for raising this. After exploring the codebase I found duplication in three distinct areas, ranging from easy wins to larger structural refactors. Here is what I found and how to address each.


1. CI/CD Workflow Boilerplate (highest-impact, ~180 lines)

What is duplicated:
Three step-blocks appear verbatim in every job in .forgejo/workflows/ci.yml and .forgejo/workflows/deploy.yml (6 occurrences total):

  • Check runner tools – verifies dagger, task, stunnel4, and netcat-openbsd are installed
  • Setup Dagger Remote Engine (via stunnel) – sets four secrets as env vars and calls scripts/setup_dagger_remote.sh
  • Cleanup TLS credentialsrm -rf /tmp/dagger-tls /tmp/stunnel-dagger.conf /tmp/stunnel.pid

Approach – Forgejo reusable workflow:
Forgejo supports the same workflow_call trigger as GitHub Actions. Create a new file .forgejo/workflows/dagger-setup.yml (a reusable workflow) that encapsulates the three steps and exposes the four Dagger secrets as secrets: inputs. Each job in ci.yml and deploy.yml then replaces those three steps with a single uses: reference:

# .forgejo/workflows/dagger-setup.yml  (new file)
on:
  workflow_call:
    secrets:
      DAGGER_STUNNEL_URL: { required: true }
      DAGGER_CA_CERT:     { required: true }
      DAGGER_CLIENT_CERT: { required: true }
      DAGGER_CLIENT_KEY:  { required: true }

jobs:
  setup:
    runs-on: ubuntu-latest
    steps:
      - name: Check runner tools
        run: |
          command -v dagger  >/dev/null 2>&1 || { echo "ERROR: dagger ..."; exit 1; }
          ...
      - name: Setup Dagger Remote Engine (via stunnel)
        env:
          DAGGER_STUNNEL_URL: ${{ secrets.DAGGER_STUNNEL_URL }}
          ...
        run: scripts/setup_dagger_remote.sh
      - name: Cleanup TLS credentials
        if: always()
        run: rm -rf /tmp/dagger-tls /tmp/stunnel-dagger.conf /tmp/stunnel.pid

Files to change:

  • New: .forgejo/workflows/dagger-setup.yml
  • Edit: .forgejo/workflows/ci.yml (1 job)
  • Edit: .forgejo/workflows/deploy.yml (5 jobs)

Risk: The if: always() on the cleanup step must propagate correctly when calling from a parent job. Test with one job first before rolling out to all five.

Open question: Forgejo's workflow_call implementation may lag GitHub Actions. Verify the self-hosted Forgejo instance version supports it before starting. If not supported, the fallback is to extract the three shell blocks into a single scripts/dagger-job-setup.sh (and scripts/dagger-job-cleanup.sh) and call those scripts from each job – less elegant but zero workflow-engine risk.


2. Python Subprocess Wrapper Pattern (medium, ~25 lines)

What is duplicated:
scripts/agent_loop.py has two helper functions (_fgj, lines 90-97, and _tea_get, lines 100-115) that share the same structure: run a subprocess, check the return code, raise a RuntimeError with combined stderr/stdout, and – for _tea_get – parse JSON and check for an error envelope. The pattern will likely be repeated if more CLI wrappers are added.

Approach – shared private helper:
Extract a small _run_cmd(cmd, parse_json=False) helper at the top of agent_loop.py that handles the subprocess call, error raising, and optional JSON parse. Both _fgj and _tea_get become thin wrappers around it. No new file needed.

Files to change:

  • scripts/agent_loop.py

Risk: Low. The only observable change is that error messages use a slightly different format; the existing tests in scripts/test_agent_loop.py that cover _fgj/_tea_get will catch regressions.


3. Dart Repository Implementations (larger refactor)

What is duplicated:
The eight files in lib/data/repositories/ (AccountRepositoryImpl, DraftRepositoryImpl, EmailRepositoryImpl, MailboxRepositoryImpl, SearchHistoryRepositoryImpl, ShareKeyRepositoryImpl, UndoRepositoryImpl, SyncLogRepositoryImpl) all follow the same constructor + Drift query + stream-wrapping + _toModel conversion pattern.

Approach options:

Option A – Generic base class (simpler):
Introduce an abstract DriftRepositoryBase<TRow, TModel> that captures the shared constructor and query helpers as protected methods. Each concrete class extends it and only implements the Drift TableInfo reference and _toModel. This keeps all logic in plain Dart and is easy to read.

Option B – Code generation (more powerful):
The project already uses build_runner for Drift and Mockito. A custom Builder could generate the boilerplate entirely from the @DriftAccessor annotation. However, this adds build-system complexity and is harder to debug.

Recommended approach: Option A (generic base class) as a first step. If the repository layer grows further, revisit Option B.

Files to change:

  • New: lib/data/repositories/drift_repository_base.dart
  • Edit: each of the 8 *RepositoryImpl files (reduce each by ~20-30 lines)

Risk: Medium. The repositories are used throughout the app and tested via Mockito mocks generated from their interfaces – the interface contracts do not change, so most tests remain unaffected. The main risk is subtly breaking the Drift query generics; run flutter analyze and the full test suite after each migration.


4. Static Analysis Tooling to Prevent Regression

Once the above is done, consider adding tooling to catch future duplication early:

  • jscpd (or dcm for Dart) – copy-paste detector for Dart/YAML. Can be added as a pre-commit hook or CI step.
  • Dart Code Metrics (DCM) – has a dedicated duplicate-code rule and integrates with the existing analysis_options.yaml.
  • YAML lintingyamllint with a custom rule to flag long duplicated blocks (less precise but zero setup).

A .jscpd.json config at the repo root could limit detection to lib/ and scripts/ and set a minimum token threshold (e.g. 50 tokens) to avoid false positives from boilerplate.


Summary / Suggested Order

Priority Area Effort Files
1 CI workflow boilerplate Small 2 workflow files + 1 new
2 Python subprocess helper Tiny agent_loop.py
3 Dart repository base class Medium 9 Dart files
4 Duplication detector in CI Small config + CI step

I suggest tackling them in this order: items 1 and 2 are low-risk and immediately reduce maintenance surface; item 3 is the most impactful in terms of code volume but needs careful testing; item 4 locks in the gains long-term.

## Implementation Plan: Avoid Duplicated Code Thanks for raising this. After exploring the codebase I found duplication in three distinct areas, ranging from easy wins to larger structural refactors. Here is what I found and how to address each. --- ### 1. CI/CD Workflow Boilerplate (highest-impact, ~180 lines) **What is duplicated:** Three step-blocks appear verbatim in every job in `.forgejo/workflows/ci.yml` and `.forgejo/workflows/deploy.yml` (6 occurrences total): - *Check runner tools* – verifies `dagger`, `task`, `stunnel4`, and `netcat-openbsd` are installed - *Setup Dagger Remote Engine (via stunnel)* – sets four secrets as env vars and calls `scripts/setup_dagger_remote.sh` - *Cleanup TLS credentials* – `rm -rf /tmp/dagger-tls /tmp/stunnel-dagger.conf /tmp/stunnel.pid` **Approach – Forgejo reusable workflow:** Forgejo supports the same `workflow_call` trigger as GitHub Actions. Create a new file `.forgejo/workflows/dagger-setup.yml` (a reusable workflow) that encapsulates the three steps and exposes the four Dagger secrets as `secrets:` inputs. Each job in `ci.yml` and `deploy.yml` then replaces those three steps with a single `uses:` reference: ```yaml # .forgejo/workflows/dagger-setup.yml (new file) on: workflow_call: secrets: DAGGER_STUNNEL_URL: { required: true } DAGGER_CA_CERT: { required: true } DAGGER_CLIENT_CERT: { required: true } DAGGER_CLIENT_KEY: { required: true } jobs: setup: runs-on: ubuntu-latest steps: - name: Check runner tools run: | command -v dagger >/dev/null 2>&1 || { echo "ERROR: dagger ..."; exit 1; } ... - name: Setup Dagger Remote Engine (via stunnel) env: DAGGER_STUNNEL_URL: ${{ secrets.DAGGER_STUNNEL_URL }} ... run: scripts/setup_dagger_remote.sh - name: Cleanup TLS credentials if: always() run: rm -rf /tmp/dagger-tls /tmp/stunnel-dagger.conf /tmp/stunnel.pid ``` **Files to change:** - New: `.forgejo/workflows/dagger-setup.yml` - Edit: `.forgejo/workflows/ci.yml` (1 job) - Edit: `.forgejo/workflows/deploy.yml` (5 jobs) **Risk:** The `if: always()` on the cleanup step must propagate correctly when calling from a parent job. Test with one job first before rolling out to all five. **Open question:** Forgejo's `workflow_call` implementation may lag GitHub Actions. Verify the self-hosted Forgejo instance version supports it before starting. If not supported, the fallback is to extract the three shell blocks into a single `scripts/dagger-job-setup.sh` (and `scripts/dagger-job-cleanup.sh`) and call those scripts from each job – less elegant but zero workflow-engine risk. --- ### 2. Python Subprocess Wrapper Pattern (medium, ~25 lines) **What is duplicated:** `scripts/agent_loop.py` has two helper functions (`_fgj`, lines 90-97, and `_tea_get`, lines 100-115) that share the same structure: run a subprocess, check the return code, raise a `RuntimeError` with combined stderr/stdout, and – for `_tea_get` – parse JSON and check for an error envelope. The pattern will likely be repeated if more CLI wrappers are added. **Approach – shared private helper:** Extract a small `_run_cmd(cmd, parse_json=False)` helper at the top of `agent_loop.py` that handles the subprocess call, error raising, and optional JSON parse. Both `_fgj` and `_tea_get` become thin wrappers around it. No new file needed. **Files to change:** - `scripts/agent_loop.py` **Risk:** Low. The only observable change is that error messages use a slightly different format; the existing tests in `scripts/test_agent_loop.py` that cover `_fgj`/`_tea_get` will catch regressions. --- ### 3. Dart Repository Implementations (larger refactor) **What is duplicated:** The eight files in `lib/data/repositories/` (`AccountRepositoryImpl`, `DraftRepositoryImpl`, `EmailRepositoryImpl`, `MailboxRepositoryImpl`, `SearchHistoryRepositoryImpl`, `ShareKeyRepositoryImpl`, `UndoRepositoryImpl`, `SyncLogRepositoryImpl`) all follow the same constructor + Drift query + stream-wrapping + `_toModel` conversion pattern. **Approach options:** *Option A – Generic base class (simpler):* Introduce an abstract `DriftRepositoryBase<TRow, TModel>` that captures the shared constructor and query helpers as protected methods. Each concrete class extends it and only implements the Drift `TableInfo` reference and `_toModel`. This keeps all logic in plain Dart and is easy to read. *Option B – Code generation (more powerful):* The project already uses `build_runner` for Drift and Mockito. A custom `Builder` could generate the boilerplate entirely from the `@DriftAccessor` annotation. However, this adds build-system complexity and is harder to debug. **Recommended approach:** Option A (generic base class) as a first step. If the repository layer grows further, revisit Option B. **Files to change:** - New: `lib/data/repositories/drift_repository_base.dart` - Edit: each of the 8 `*RepositoryImpl` files (reduce each by ~20-30 lines) **Risk:** Medium. The repositories are used throughout the app and tested via Mockito mocks generated from their interfaces – the interface contracts do not change, so most tests remain unaffected. The main risk is subtly breaking the Drift query generics; run `flutter analyze` and the full test suite after each migration. --- ### 4. Static Analysis Tooling to Prevent Regression Once the above is done, consider adding tooling to catch future duplication early: - **[jscpd](https://github.com/kucherenko/jscpd)** (or **dcm** for Dart) – copy-paste detector for Dart/YAML. Can be added as a pre-commit hook or CI step. - **Dart Code Metrics (DCM)** – has a dedicated duplicate-code rule and integrates with the existing `analysis_options.yaml`. - **YAML linting** – `yamllint` with a custom rule to flag long duplicated blocks (less precise but zero setup). A `.jscpd.json` config at the repo root could limit detection to `lib/` and `scripts/` and set a minimum token threshold (e.g. 50 tokens) to avoid false positives from boilerplate. --- ### Summary / Suggested Order | Priority | Area | Effort | Files | |----------|------|--------|-------| | 1 | CI workflow boilerplate | Small | 2 workflow files + 1 new | | 2 | Python subprocess helper | Tiny | `agent_loop.py` | | 3 | Dart repository base class | Medium | 9 Dart files | | 4 | Duplication detector in CI | Small | config + CI step | I suggest tackling them in this order: items 1 and 2 are low-risk and immediately reduce maintenance surface; item 3 is the most impactful in terms of code volume but needs careful testing; item 4 locks in the gains long-term.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: guettli/sharedinbox#255