Add timeouts to all CI/CD tasks and runner scripts #415

New Issue

2026-06-04T07:58:55Z

guettli commented

2026-06-04 07:58:55 +00:00

(Migrated from codeberg.org)

Add timeouts to all CI/CD tasks and runner scripts

Goal

Implement execution time limits on all automation tasks, wrapper scripts, and CI/CD workflow jobs to prevent hung or looping processes from running indefinitely.

Action Items

Workflow Timeouts: Add explicit job-level timeouts (timeout-minutes) to all Forgejo Action workflows (such as ci.yml, deploy.yml, firebase-tests.yml, renovate.yml).
Process Timeouts: Wrap long-running or network-dependent commands (like dagger call or remote SSH executions) with the Linux timeout command in runner scripts (e.g., timeout 30m dagger call ...).
Monitoring/Cleanup: Add basic logging/warnings for tasks that run longer than expected.

# Add timeouts to all CI/CD tasks and runner scripts ## Goal Implement execution time limits on all automation tasks, wrapper scripts, and CI/CD workflow jobs to prevent hung or looping processes from running indefinitely. ## Action Items - **Workflow Timeouts:** Add explicit job-level timeouts (`timeout-minutes`) to all Forgejo Action workflows (such as `ci.yml`, `deploy.yml`, `firebase-tests.yml`, `renovate.yml`). - **Process Timeouts:** Wrap long-running or network-dependent commands (like `dagger call` or remote SSH executions) with the Linux `timeout` command in runner scripts (e.g., `timeout 30m dagger call ...`). - **Monitoring/Cleanup:** Add basic logging/warnings for tasks that run longer than expected.

guettlibot commented

2026-06-04 08:05:03 +00:00

(Migrated from codeberg.org)

Now I have a complete picture. Here is the implementation plan:

Implementation Plan: Add Timeouts to All CI/CD Tasks and Runner Scripts

Current State

After auditing all workflows and scripts:

Already have timeouts: deploy.yml (all 5 jobs), firebase-tests.yml (both jobs), renovate.yml, website.yml
Missing timeouts: ci.yml → check job; windows-nightly.yml → windows-nightly job (currently disabled with if: false)
Script network calls without timeouts: ssh-keyscan and ssh -f -N -L tunnel creation in scripts/setup_dagger_remote.sh (the subsequent dagger core --help already uses timeout 45)
Taskfile dagger call invocations without timeouts: test-backend, integration-ui, sync-reliability, ci-graph, deploy-linux, build-android-bundle, upload-android-bundle, publish-android, deploy-apk, publish-website; check-dagger already uses timeout --kill-after=10 600 ✅
scripts/run_firebase_test.sh: The dagger call inside _run() has no per-attempt timeout

Step 1 — Add missing `timeout-minutes` to workflow jobs

File: .forgejo/workflows/ci.yml

Add timeout-minutes: 60 to the check job. The inner check-dagger task already enforces a 600 s (10 min) Dagger timeout with up to 3 retries, so 60 min is a safe ceiling that also covers checkout and Dagger setup overhead.

File: .forgejo/workflows/windows-nightly.yml

Add timeout-minutes: 90 to the windows-nightly job. The job is currently gated with if: false (no runner registered), but adding the timeout now means it is correctly bounded when a Windows runner is eventually registered. 90 min accounts for slower Windows Flutter builds.

Step 2 — Add timeouts to network operations in `scripts/setup_dagger_remote.sh`

Two calls can hang indefinitely if the remote host is unreachable:

ssh-keyscan — wrap with timeout 30: timeout 30 ssh-keyscan -H "$DAGGER_ENGINE_HOST" >> ~/.ssh/known_hosts 2>/dev/null
SSH tunnel creation — wrap with timeout 30: timeout 30 ssh -i ~/.ssh/dagger_key -o StrictHostKeyChecking=no -f -N -L 8080:localhost:1774 "dagger@$DAGGER_ENGINE_HOST". The -f flag causes ssh to background itself once the tunnel is established, so the foreground process exits quickly on success; timeout 30 catches the case where the connection never completes.

On failure of either call, the existing set -euo pipefail will abort the script with a clear error.

Step 3 — Wrap `dagger call` in Taskfile tasks

Apply timeout --kill-after=10 <N> before dagger call in each task. Use --kill-after=10 so a SIGKILL follows 10 s after SIGTERM if Dagger does not respond (mirroring the existing check-dagger pattern).

Timeouts by category:

Task	Timeout	Rationale
`test-backend`, `integration-ui`, `sync-reliability`, `ci-graph`	`600` s (10 min)	Test/query pipelines; the CI job already caps at 60 min
`deploy-linux`, `publish-android`, `deploy-apk`, `build-android-bundle`, `upload-android-bundle`, `publish-website`	`1800` s (30 min)	Build + deploy pipelines; CI job caps at 60 min

Exclusions:

stalwart — intentionally long-running dev server; do not add a timeout
check-dagger — already has its own timeout + retry logic ✅

Step 4 — Add per-attempt timeout to `scripts/run_firebase_test.sh`

Inside the _run() function, wrap the dagger call with timeout --kill-after=10 2400 (40 min per attempt). Firebase Test Lab jobs can take 20–30 min; 40 min per attempt with up to 3 retries stays well under the 60 min job-level ceiling in firebase-tests.yml.

Step 5 — Monitoring / warnings for long-running tasks

The issue requests "basic logging/warnings for tasks that run longer than expected." Minimal, idiomatic additions:

setup_dagger_remote.sh: After the tunnel is established, print the elapsed time of each network call using SECONDS (built-in bash variable). Emit a ::warning:: line if ssh-keyscan or tunnel setup takes more than 10 s.
Taskfile tasks: No new logging needed beyond timeout itself — when the limit is hit, timeout exits with code 124 and the task name in the output makes it self-evident which step hung. The existing check-dagger pattern of emitting a message on exit-124 (e.g. "hung in teardown after success; treating as exit 0") can be reused if that edge case applies to other tasks.
Workflows: Forgejo Actions already logs step wall-clock times in the run UI; the new timeout-minutes values make the hard limit explicit without any additional logging code.

Files to change (summary)

File	Change
`.forgejo/workflows/ci.yml`	Add `timeout-minutes: 60` to `check` job
`.forgejo/workflows/windows-nightly.yml`	Add `timeout-minutes: 90` to `windows-nightly` job
`scripts/setup_dagger_remote.sh`	`timeout 30` on `ssh-keyscan`; `timeout 30` on `ssh -f -N -L`; elapsed-time warnings
`Taskfile.yml`	`timeout --kill-after=10 <N>` on all bare `dagger call` invocations (10 tasks)
`scripts/run_firebase_test.sh`	`timeout --kill-after=10 2400` inside `_run()` around `dagger call`

Now I have a complete picture. Here is the implementation plan: --- ## Implementation Plan: Add Timeouts to All CI/CD Tasks and Runner Scripts ### Current State After auditing all workflows and scripts: - **Already have timeouts:** `deploy.yml` (all 5 jobs), `firebase-tests.yml` (both jobs), `renovate.yml`, `website.yml` - **Missing timeouts:** `ci.yml` → `check` job; `windows-nightly.yml` → `windows-nightly` job (currently disabled with `if: false`) - **Script network calls without timeouts:** `ssh-keyscan` and `ssh -f -N -L` tunnel creation in `scripts/setup_dagger_remote.sh` (the subsequent `dagger core --help` already uses `timeout 45`) - **Taskfile `dagger call` invocations without timeouts:** `test-backend`, `integration-ui`, `sync-reliability`, `ci-graph`, `deploy-linux`, `build-android-bundle`, `upload-android-bundle`, `publish-android`, `deploy-apk`, `publish-website`; `check-dagger` already uses `timeout --kill-after=10 600` ✅ - **`scripts/run_firebase_test.sh`:** The `dagger call` inside `_run()` has no per-attempt timeout --- ### Step 1 — Add missing `timeout-minutes` to workflow jobs **File: `.forgejo/workflows/ci.yml`** Add `timeout-minutes: 60` to the `check` job. The inner `check-dagger` task already enforces a 600 s (10 min) Dagger timeout with up to 3 retries, so 60 min is a safe ceiling that also covers checkout and Dagger setup overhead. **File: `.forgejo/workflows/windows-nightly.yml`** Add `timeout-minutes: 90` to the `windows-nightly` job. The job is currently gated with `if: false` (no runner registered), but adding the timeout now means it is correctly bounded when a Windows runner is eventually registered. 90 min accounts for slower Windows Flutter builds. --- ### Step 2 — Add timeouts to network operations in `scripts/setup_dagger_remote.sh` Two calls can hang indefinitely if the remote host is unreachable: 1. **`ssh-keyscan`** — wrap with `timeout 30`: `timeout 30 ssh-keyscan -H "$DAGGER_ENGINE_HOST" >> ~/.ssh/known_hosts 2>/dev/null` 2. **SSH tunnel creation** — wrap with `timeout 30`: `timeout 30 ssh -i ~/.ssh/dagger_key -o StrictHostKeyChecking=no -f -N -L 8080:localhost:1774 "dagger@$DAGGER_ENGINE_HOST"`. The `-f` flag causes `ssh` to background itself once the tunnel is established, so the foreground process exits quickly on success; `timeout 30` catches the case where the connection never completes. On failure of either call, the existing `set -euo pipefail` will abort the script with a clear error. --- ### Step 3 — Wrap `dagger call` in Taskfile tasks Apply `timeout --kill-after=10 <N>` before `dagger call` in each task. Use `--kill-after=10` so a SIGKILL follows 10 s after SIGTERM if Dagger does not respond (mirroring the existing `check-dagger` pattern). Timeouts by category: | Task | Timeout | Rationale | |---|---|---| | `test-backend`, `integration-ui`, `sync-reliability`, `ci-graph` | `600` s (10 min) | Test/query pipelines; the CI job already caps at 60 min | | `deploy-linux`, `publish-android`, `deploy-apk`, `build-android-bundle`, `upload-android-bundle`, `publish-website` | `1800` s (30 min) | Build + deploy pipelines; CI job caps at 60 min | **Exclusions:** - `stalwart` — intentionally long-running dev server; do not add a timeout - `check-dagger` — already has its own timeout + retry logic ✅ --- ### Step 4 — Add per-attempt timeout to `scripts/run_firebase_test.sh` Inside the `_run()` function, wrap the `dagger call` with `timeout --kill-after=10 2400` (40 min per attempt). Firebase Test Lab jobs can take 20–30 min; 40 min per attempt with up to 3 retries stays well under the 60 min job-level ceiling in `firebase-tests.yml`. --- ### Step 5 — Monitoring / warnings for long-running tasks The issue requests "basic logging/warnings for tasks that run longer than expected." Minimal, idiomatic additions: - **`setup_dagger_remote.sh`:** After the tunnel is established, print the elapsed time of each network call using `SECONDS` (built-in bash variable). Emit a `::warning::` line if `ssh-keyscan` or tunnel setup takes more than 10 s. - **Taskfile tasks:** No new logging needed beyond `timeout` itself — when the limit is hit, `timeout` exits with code 124 and the task name in the output makes it self-evident which step hung. The existing `check-dagger` pattern of emitting a message on exit-124 (e.g. "hung in teardown after success; treating as exit 0") can be reused if that edge case applies to other tasks. - **Workflows:** Forgejo Actions already logs step wall-clock times in the run UI; the new `timeout-minutes` values make the hard limit explicit without any additional logging code. --- ### Files to change (summary) | File | Change | |---|---| | `.forgejo/workflows/ci.yml` | Add `timeout-minutes: 60` to `check` job | | `.forgejo/workflows/windows-nightly.yml` | Add `timeout-minutes: 90` to `windows-nightly` job | | `scripts/setup_dagger_remote.sh` | `timeout 30` on `ssh-keyscan`; `timeout 30` on `ssh -f -N -L`; elapsed-time warnings | | `Taskfile.yml` | `timeout --kill-after=10 <N>` on all bare `dagger call` invocations (10 tasks) | | `scripts/run_firebase_test.sh` | `timeout --kill-after=10 2400` inside `_run()` around `dagger call` |

guettlibot commented

2026-06-04 20:45:02 +00:00

(Migrated from codeberg.org)

The PR is open at https://codeberg.org/guettli/sharedinbox/pulls/432.

Here's what was implemented for issue #415:

Workflow jobs — added missing timeout-minutes:

ci.yml → check job: 60 min
windows-nightly.yml → windows-nightly job: 90 min (ready for when a Windows runner is registered)

scripts/setup_dagger_remote.sh — two network calls that could hang indefinitely are now wrapped with timeout 30:

ssh-keyscan (host key retrieval)
ssh -f -N -L (background tunnel creation)
Both emit a ::warning:: annotation if they take more than 10 s.

Taskfile.yml — all bare dagger call invocations are now wrapped with timeout --kill-after=10 <N> (600 s for test/query tasks, 1800 s for build/deploy tasks, 60 s for ci-graph). stalwart (intentional long-running dev server) and check-dagger (already had its own timeout+retry loop) were left untouched.

scripts/run_firebase_test.sh — added timeout --kill-after=10 2400 per attempt around dagger call; the retry loop now detects exit 124 and emits a ::warning:: annotation instead of silently retrying.

The PR is open at **https://codeberg.org/guettli/sharedinbox/pulls/432**. --- Here's what was implemented for issue #415: **Workflow jobs** — added missing `timeout-minutes`: - `ci.yml` → `check` job: 60 min - `windows-nightly.yml` → `windows-nightly` job: 90 min (ready for when a Windows runner is registered) **`scripts/setup_dagger_remote.sh`** — two network calls that could hang indefinitely are now wrapped with `timeout 30`: - `ssh-keyscan` (host key retrieval) - `ssh -f -N -L` (background tunnel creation) Both emit a `::warning::` annotation if they take more than 10 s. **`Taskfile.yml`** — all bare `dagger call` invocations are now wrapped with `timeout --kill-after=10 <N>` (600 s for test/query tasks, 1800 s for build/deploy tasks, 60 s for `ci-graph`). `stalwart` (intentional long-running dev server) and `check-dagger` (already had its own timeout+retry loop) were left untouched. **`scripts/run_firebase_test.sh`** — added `timeout --kill-after=10 2400` per attempt around `dagger call`; the retry loop now detects exit 124 and emits a `::warning::` annotation instead of silently retrying.

Sign in to join this conversation.

Branches Tags

main

issue-563-agentloop-validation

dummy-pr-test

issue-560-fix-firebase-run-url

issue-539-stable-imap-uid

issue-533-shared-email-list

plan-issue-555

drop-nix

plan-issue-484

plan-issue-539

plan-issue-535

plan-issue-474

plan-issue-533

fix-dagger-engineless-precommit

issue-521-fix-deploy-yml-wait-time-api

issue-502-fix-email-id-collision-mailbox

issue-492-eliminate-duplicate-build-runner

issue-494-website-change-detection

issue-491-parallelize-check

issue-478-fix-stalwart-dual-stack-bind

issue-475-allowed-addresses-glob

issue-473-search-result-reorder

issue-453-update-agentloop-defaults

issue-466-structured-search

issue-505-exclude-chaos-monkey-from-regular-ci

issue-509-fix-search-result-sorting

fix-ink-sparkle-remaining-tests

issue-506-fix-search-emails-tests

issue-504-runner-wait-time

issue-488-search-notes

issue-472-changelog-issue-links

issue-501-folder-search-local-sqlite

issue-486-fix-stale-test-shader-mismatch

fix/prevent-settled-search-rerun-473

issue-467-fix-search-stale-results

issue-446-installed-versions-in-changelog

issue-462-fix-pr

issue-448-chaos-monkey-test

issue-436-notes-on-emails

issue-429-unify-mail-display

issue-422-move-to-folder-create-new

issue-414-ensure-not-run-as-root

issue-424-unify-email-list-views

issue-419-trusted-senders-page

issue-425-fix-prs

test-foo

issue-421-bug-report

issue-383-fix-ci

issue-394-fix-deploy-flutter-version

issue-391-fix-ci-double-trigger

issue-376-combined-inbox-v2

issue-376-combined-inbox

issue-384-fix-open-prs

sops-migrate

issue-339-safe-first-on-imap-fetch

issue-340-try-catch-measure-height

issue-342-pin-intl-version

issue-341-guard-threademails-last

issue-335-agentloop-code-test

issue-329-fix

issue-315-fix

issue-320-fix

issue-325-fix

issue-312-fix

issue-311-fix

issue-305-fix

issue-304-fix

issue-299-fix

issue-300-fix

issue-298-fix

issue-296-fix

issue-294-fix

issue-289-fix

issue-288-fix

issue-287-fix

issue-286-fix

issue-277-fix

issue-282-fix

issue-280-fix

issue-272-fix

issue-268-fix

issue-267-fix

issue-266-fix

issue-258-fix

issue-260-fix

issue-257-fix

issue-253-fix

issue-216-fix

issue-251-fix

issue-249-fix

issue-question-fixes

issue-235-fix

issue-236-fix-v2

issue-237-fix

issue-236-fix

issue-228-fix

issue-217-fix

issue-214-fix

issue-213-fix

issue-208-fix

issue-205-fix

issue-204-fix

issue-203-fix

issue-202-fix

issue-129-fix

issue-161-fix

issue-160-fix

issue-201-fix

issue-210-fix

issue-198-fix

issue-200-fix

issue-144-fix

issue-199-fix

fix/playstore-upload-use-requests

issue-193-fix

issue-186-fix

issue-185-fix

issue-192-fix

issue-183-fix

issue-175-fix

issue-172-fix

issue-171-fix

issue-167-fix

issue-136-fix

issue-162-fix

issue-179-fix

issue-155-fix

issue-154-fix

issue-152-fix

issue-151-fix

issue-141-fix

issue-150-fix

issue-164-fix

migrate-to-dagger

task/d1-ci-matrix

task/a4-typeconverter-json

task/u7-onboarding-walkthrough

task/d3-sync-doc

task/a5-layer-boundary-lint

task/t5-golden-tests

task/p5-date-cache

task/s4-link-handling

task/p3-html-parse-isolate

task/u8-mark-all-read

task/u3-recent-searches

task/a3-jmap-injectable-http-client

task/r5-tls-error-handling

fix/playstore-redirect-retry

task/t3-repository-contract-tests

task/p2-email-list-pagination

task/p1-fts5-search

fix/playstore-upload-timeout

task/a1-email-detail-notifier

fix/upgrade-workmanager-0.9

fix/android-core-library-desugaring

task/p4-db-indexes

task/r3-html-error-boundary

task/d2-check-coverage

task/a2-email-tile

task/t4-migration-tests

task/t2-widget-tests

task/t1-email-repo-coverage

task/u6-connection-status

task/u4-push-notifications

task/u2-draft-sync

task/u1-list-unsubscribe

task/s2-hostname-validation

task/r6-reliability-fuzz-tests

task/r4-sync-error-banner

task/r2-force-resync

task/r1-undo-history-persistence

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: guettli/sharedinbox#415

Add timeouts to all CI/CD tasks and runner scripts #415

Add timeouts to all CI/CD tasks and runner scripts

Goal

Action Items

Implementation Plan: Add Timeouts to All CI/CD Tasks and Runner Scripts

Current State

Step 1 — Add missing timeout-minutes to workflow jobs

Step 2 — Add timeouts to network operations in scripts/setup_dagger_remote.sh

Step 3 — Wrap dagger call in Taskfile tasks

Step 4 — Add per-attempt timeout to scripts/run_firebase_test.sh

Step 5 — Monitoring / warnings for long-running tasks

Files to change (summary)

Step 1 — Add missing `timeout-minutes` to workflow jobs

Step 2 — Add timeouts to network operations in `scripts/setup_dagger_remote.sh`

Step 3 — Wrap `dagger call` in Taskfile tasks

Step 4 — Add per-attempt timeout to `scripts/run_firebase_test.sh`