Deploy failed #366

Closed
opened 2026-06-03 17:40:26 +00:00 by guettli · 1 comment
guettli commented 2026-06-03 17:40:26 +00:00 (Migrated from codeberg.org)
Deploy failed, fix: https://codeberg.org/guettli/sharedinbox/actions/runs/1424/jobs/1/attempt/1
guettlibot commented 2026-06-03 18:05:03 +00:00 (Migrated from codeberg.org)

I'll start by examining the repository structure and recent changes to understand what caused the deploy failure.Let me look at the CI configuration and recent commits to understand the failure.Let me examine the CI/CD workflow files and the Taskfile to understand the deploy pipeline.Now let me look at the deploy workflow file.Let me check the scripts and the runner Dockerfile.Let me look at the runner Dockerfile and recent commit details.Let me look at the other recent commit and what the DAGGER.md contains.Let me check the dagger.json and what the deploy.yml job 1 could be.Let me try to check the CI logs using the available tools.I can see "Update Website" is consistently failing. Let me investigate.Let me try to get the CI logs using the script.Found the issue: the "Verify Play Store deployment" step fails because pip can't install packages in the externally-managed Python environment (PEP 668 on Ubuntu 24.04). Let me also check the other failing jobs.Let me check the history of setup_dagger_remote.sh and see what changed.Let me check the previous successful deploy run to see if the SSH steps ran.Let me check whether any of the previous "successful" deploys actually ran the SSH steps.Let me look at the SOPS secrets file structure and understand how the SSH key is stored.Let me also check the website.yml failure and look at the verify_playstore_deploy.py script.Let me check the website.yml failure log to confirm the root cause.Now I have a complete picture of all failures:

  1. Job 1 (Play Store): Fails at "Verify Play Store deployment" — pip install google-auth requests fails with PEP 668 (externally-managed Python on Ubuntu 24.04)
  2. Jobs 2, 3 (APK/Linux deploy) + website: All fail with Load key "/root/.ssh/id_ed25519": error in libcrypto inside the Dagger Alpine container — this is the first time the SSH deploy actually ran (all previous deploys had every job skipped)

Let me fix all three issues:Now I'll make the three fixes:Now fix the export_secret function to avoid double trailing newlines:Now fix the Deployer in ci/main.go to normalize the SSH key (strip CRLF endings):Now let me verify the changes look correct:The changes look good. Let me now create the branch and commit:The PR was created at https://codeberg.org/guettli/sharedinbox/pulls/369.

Here's a summary of what I found and fixed:

Root cause analysis of CI run #1424 failures:

  1. Play Store job (jobs/1): The Verify Play Store deployment step called pip install google-auth requests directly on the Ubuntu 24.04 runner, which fails with PEP 668 ("externally-managed-environment"). The || true silenced the pip error, but then python3 scripts/verify_playstore_deploy.py failed with ModuleNotFoundError: No module named 'google'.

    • Fix: switched to python3 -m venv /tmp/playstore-venv + install into the venv
  2. APK deploy, Linux deploy, and website update jobs: All fail with Load key "/root/.ssh/id_ed25519": error in libcrypto inside the Dagger Alpine 3.21 container. This was the first time these jobs actually ran — every previous "successful" deploy run had all deploy jobs skipped (because no android/linux files had changed since the last successful deploy-playstore run, which turned out to never have happened).

    • Fix 1 in setup_dagger_remote.sh: export_secret was using printf '%s\n' "$value" which adds a second trailing newline to SSH private keys (which already end with \n), potentially corrupting PEM parsing.
    • Fix 2 in ci/main.go Deployer: mounts the SSH key at a raw path, then runs tr -d '\r' to strip Windows-style CRLF line endings before writing the final key file. CRLF bytes in the key content cause "error in libcrypto" in Alpine's LibreSSL-backed openssh.
I'll start by examining the repository structure and recent changes to understand what caused the deploy failure.Let me look at the CI configuration and recent commits to understand the failure.Let me examine the CI/CD workflow files and the Taskfile to understand the deploy pipeline.Now let me look at the deploy workflow file.Let me check the scripts and the runner Dockerfile.Let me look at the runner Dockerfile and recent commit details.Let me look at the other recent commit and what the DAGGER.md contains.Let me check the dagger.json and what the deploy.yml job 1 could be.Let me try to check the CI logs using the available tools.I can see "Update Website" is consistently failing. Let me investigate.Let me try to get the CI logs using the script.Found the issue: the "Verify Play Store deployment" step fails because pip can't install packages in the externally-managed Python environment (PEP 668 on Ubuntu 24.04). Let me also check the other failing jobs.Let me check the history of setup_dagger_remote.sh and see what changed.Let me check the previous successful deploy run to see if the SSH steps ran.Let me check whether any of the previous "successful" deploys actually ran the SSH steps.Let me look at the SOPS secrets file structure and understand how the SSH key is stored.Let me also check the website.yml failure and look at the verify_playstore_deploy.py script.Let me check the website.yml failure log to confirm the root cause.Now I have a complete picture of all failures: 1. **Job 1 (Play Store)**: Fails at "Verify Play Store deployment" — `pip install google-auth requests` fails with PEP 668 (externally-managed Python on Ubuntu 24.04) 2. **Jobs 2, 3 (APK/Linux deploy) + website**: All fail with `Load key "/root/.ssh/id_ed25519": error in libcrypto` inside the Dagger Alpine container — this is the first time the SSH deploy actually ran (all previous deploys had every job skipped) Let me fix all three issues:Now I'll make the three fixes:Now fix the `export_secret` function to avoid double trailing newlines:Now fix the Deployer in `ci/main.go` to normalize the SSH key (strip CRLF endings):Now let me verify the changes look correct:The changes look good. Let me now create the branch and commit:The PR was created at https://codeberg.org/guettli/sharedinbox/pulls/369. Here's a summary of what I found and fixed: **Root cause analysis of CI run #1424 failures:** 1. **Play Store job** (`jobs/1`): The `Verify Play Store deployment` step called `pip install google-auth requests` directly on the Ubuntu 24.04 runner, which fails with PEP 668 ("externally-managed-environment"). The `|| true` silenced the pip error, but then `python3 scripts/verify_playstore_deploy.py` failed with `ModuleNotFoundError: No module named 'google'`. - Fix: switched to `python3 -m venv /tmp/playstore-venv` + install into the venv 2. **APK deploy, Linux deploy, and website update jobs**: All fail with `Load key "/root/.ssh/id_ed25519": error in libcrypto` inside the Dagger Alpine 3.21 container. This was the *first time* these jobs actually ran — every previous "successful" deploy run had all deploy jobs *skipped* (because no android/linux files had changed since the last successful deploy-playstore run, which turned out to never have happened). - Fix 1 in `setup_dagger_remote.sh`: `export_secret` was using `printf '%s\n' "$value"` which adds a second trailing newline to SSH private keys (which already end with `\n`), potentially corrupting PEM parsing. - Fix 2 in `ci/main.go` `Deployer`: mounts the SSH key at a raw path, then runs `tr -d '\r'` to strip Windows-style CRLF line endings before writing the final key file. CRLF bytes in the key content cause "error in libcrypto" in Alpine's LibreSSL-backed openssh.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: guettli/sharedinbox#366