Compare commits

...
Author SHA1 Message Date
Thomas SharedInboxandClaude Sonnet 4.6 5b93a59537 fix: survive permanently broken path_provider channel on Android (#192)
Two changes prevent the crash reported on Samsung S1RXS32.50-13-25:

1. Add WidgetsFlutterBinding.ensureInitialized() to callbackDispatcher so
   that Flutter platform channels (including path_provider) are available
   when WorkManager triggers background sync.  Without it the channel is
   permanently unavailable in that isolate regardless of how long we wait.

2. Add an Android-specific fallback in _resolveDatabasePath: after all
   back-off retries fail, derive the app files-dir path from
   /proc/self/cmdline (the Android process name equals the package name)
   without a platform channel.  This lets the database open on devices
   where path_provider is broken even in the main isolate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 03:41:12 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 72b25c87ac fix(ci): retry on 'invalid return status code' Dagger disconnect
Adds the Dagger gRPC/HTTP disconnect error to the retry pattern
so transient engine drops during long-running steps (like build_runner)
auto-recover instead of failing the CI job.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 03:15:30 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 931186dc45 fix(ci): replace DinD with plain TCP proxy and simplify Docker discovery
The DinD service approach was crashing the job (exit 2) because the
Forgejo runner on this host does not honour the `options: --privileged`
field for service containers, so dockerd inside DinD could never start.

Root cause of the broader CI failure: dagger-stunnel.service stopped
cleanly (exit 0 → no auto-restart), leaving port 8774 without a
listener. A plain socat TCP proxy (8774→1774) is now running on the
host as a stop-gap until stunnel is restarted.

Changes:
- Remove the docker:27-dind service container from ci.yml entirely
- Simplify "Locate Docker daemon" step — warn instead of failing when
  Docker is unavailable (job fails later at the Dagger step with a
  clearer message)
- Add plain-TCP path to setup_dagger_remote.sh: after a successful nc
  probe, try `dagger version` directly over the target host:port before
  falling back to the TLS stunnel setup; this works with both the socat
  plain-TCP proxy and any future plain-TCP Dagger engine exposure

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 02:57:08 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 5abcf55aa7 fix(ci): override DOCKER_TLS_CERTDIR via docker run options and improve Docker discovery
The act runner on Codeberg may not apply the services.env block to the
DinD container, so DOCKER_TLS_CERTDIR defaults to /certs and dockerd
starts with TLS on port 2376 instead of 2375. Fix by passing
--env DOCKER_TLS_CERTDIR= directly via options: so it is always applied
at docker run time.

Also:
- Try the host Docker socket (DooD) first before DinD; many self-hosted
  runners mount /var/run/docker.sock and this is simpler and more reliable.
- Remove the workflow-level DOCKER_HOST override; let the step discover
  and export the correct value instead of pre-forcing tcp://docker:2375.
- Retry DinD by hostname up to 60 s before falling back to scanning.
- Add DNS resolution check (getent hosts docker) and a port 2376 probe
  that surfaces the TLS-still-enabled diagnostic message clearly.
- Improve final diagnostics (IPs, DNS, socket path) to aid future debugging.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 02:10:25 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 68dcee6968 fix(ci): scan all interfaces and full /24 to locate DinD daemon
The previous fallback only scanned .1-.50 of the first interface's
subnet, missing the DinD container when its IP is higher (.51+) or
when the forgejo-jobs network is on a different interface than
hostname -I returned first.

Now iterates all non-loopback IPs from hostname -I, scans each
subnet's full /24 (.1-254), and uses a 0.3 s bash /dev/tcp probe
instead of nc -zw1 to keep the total scan time under ~80 s.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 01:52:41 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 2a92c8766f fix(ci): replace ip route with hostname -I to find DinD subnet
The runner image does not have iproute2 installed, so `ip route` fails
with exit 127. Use `hostname -I` (available everywhere) to get the
container's own IP and derive the /24 prefix for the DinD port scan.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 01:37:11 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 49ad2ff25d fix(ci): add --privileged to DinD and fallback IP scan for docker hostname
The docker:27-dind service container needs --privileged to start dockerd;
without it the container exits immediately and its DNS alias is removed,
causing the embedded DNS to return SERVFAIL for 'docker'.

Codeberg's act runner may also not register the service key as a network
alias at all. Add a 'Locate Docker daemon' step that tries the configured
DOCKER_HOST first, then falls back to scanning the local /24 for port 2375
so the local Dagger engine can connect to DinD regardless.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 01:27:57 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 c487714b63 fix(ci): add DinD service so local Dagger fallback works when remote engine is down
When the remote Dagger engine (stunnel/port 8774) is unreachable, Dagger
falls back to a local engine which requires a Docker daemon. The job container
does not have /var/run/docker.sock mounted, so the fallback was failing with
"connect: no such file or directory".

Add a docker:27-dind service to the CI job and set DOCKER_HOST=tcp://docker:2375
so Dagger can start a local engine when the remote engine is unavailable.

Also guard the Firebase and Play Store steps in deploy.yml so they are skipped
gracefully when the relevant secrets are not configured.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 01:12:11 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 f560d9d921 fix(ci): fall back to local Dagger engine when remote is unreachable
The remote Dagger engine probe exits with an error when the server is
down, failing CI before any tests run. Change the probe to exit 0 on
timeout and print a warning instead; with _DAGGER_RUNNER_HOST unset
Dagger will start a local engine and CI can still complete.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 01:00:52 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 9eba422c67 fix(ci): retry Dagger engine probe and prune cache after check
The Dagger engine stopped responding (connection refused) after the
previous run exhausted disk space and crashed it. Two changes:

1. setup_dagger_remote.sh: retry the nc probe up to 5 times with 30 s
   delays so a transient crash/restart window doesn't immediately fail
   the job.

2. ci.yml: add a post-check prune step (if: always()) so the engine
   cache is cleaned up after every run, reducing the chance of disk
   exhaustion on the next run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 00:52:12 +02:00
Thomas SharedInboxandClaude Sonnet 4.6 e7d61e8ee1 fix(ci): prune Dagger cache on disk-space error and before check
Previous fix (retry × 3 with 60 s sleep) was not enough: all three
attempts still failed because the engine cache stayed full throughout.
Add an explicit `dagger query '{ engine { localCache { prune } } }'`
call (a) as a proactive step in ci.yml right after the stunnel setup,
and (b) inside the retry handler before each back-off sleep (now 90 s
instead of 60 s). The prune evicts stale execution-cache snapshots
(e.g. old pubspec.lock layers) so fresh disk is available when flutter
pub get runs. The `|| true` guard makes the prune non-fatal if the
query syntax changes between Dagger versions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 00:39:40 +02:00
Thomas SharedInbox 0e9d7c907e fix(ci): retry on disk-space errors in check-dagger
Dagger engine occasionally runs out of disk during `flutter pub get`
when multiple CI jobs run in parallel.  Space typically frees up within
~60 seconds as other containers finish.  Add "No space left on device"
as a retryable condition with a 60 s back-off so PR runs survive the
transient shortage (run 4199480 was the trigger).
2026-05-24 00:23:43 +02:00
Thomas SharedInbox ae70646ed4 fix: enable core library desugaring for flutter_local_notifications (#183)
Both isCoreLibraryDesugaringEnabled = true in compileOptions and the
coreLibraryDesugaring("com.android.tools:desugar_jdk_libs:2.1.4")
dependency are already present in android/app/build.gradle.kts from
the earlier fix in #37. This commit closes issue #183 which was opened
to track the same requirement.
2026-05-24 00:13:23 +02:00
7 changed files with 149 additions and 9 deletions
+33
View File
@@ -30,11 +30,44 @@ jobs:
DAGGER_CLIENT_KEY: ${{ secrets.DAGGER_CLIENT_KEY }}
run: scripts/setup_dagger_remote.sh
- name: Locate Docker daemon for local Dagger engine
run: |
# Skip if remote Dagger engine is already configured (preferred path)
if [ -n "${_DAGGER_RUNNER_HOST:-}" ]; then
echo "Remote Dagger engine configured, no local Docker needed."
exit 0
fi
# Try host Docker socket (DooD) if runner mounts it
if [ -S /var/run/docker.sock ]; then
if DOCKER_HOST=unix:///var/run/docker.sock docker info >/dev/null 2>&1; then
echo "Docker available via host socket."
echo "DOCKER_HOST=unix:///var/run/docker.sock" >> "$GITHUB_ENV"
exit 0
fi
fi
echo "WARNING: No remote Dagger engine and no local Docker found." >&2
echo " - Remote engine: check DAGGER_STUNNEL_URL secret and that the host proxy is running." >&2
echo " - Local Docker: runner does not expose /var/run/docker.sock." >&2
echo "CI will likely fail at the Dagger step." >&2
- name: Prune Dagger cache before check
env:
DAGGER_NO_NAG: "1"
run: dagger query '{ engine { localCache { prune } } }' 2>/dev/null || true
- name: Run Full Check Suite
env:
DAGGER_NO_NAG: "1"
run: task check-dagger
- name: Prune Dagger cache after check
if: always()
env:
DAGGER_NO_NAG: "1"
run: dagger query '{ engine { localCache { prune } } }' 2>/dev/null || true
- name: Cleanup TLS credentials
if: always()
run: rm -rf /tmp/dagger-tls /tmp/stunnel-dagger.conf /tmp/stunnel.pid
+2
View File
@@ -31,6 +31,7 @@ jobs:
run: scripts/setup_dagger_remote.sh
- name: Run Android Tests on Firebase Test Lab
if: ${{ secrets.FIREBASE_TEST_LAB_SERVICE_ACCOUNT_KEY != '' }}
env:
FIREBASE_TEST_LAB_SERVICE_ACCOUNT_KEY: ${{ secrets.FIREBASE_TEST_LAB_SERVICE_ACCOUNT_KEY }}
FIREBASE_PROJECT_ID: ${{ vars.FIREBASE_PROJECT_ID }}
@@ -66,6 +67,7 @@ jobs:
run: scripts/setup_dagger_remote.sh
- name: Publish Android to Play Store
if: ${{ secrets.PLAY_STORE_CONFIG_JSON != '' }}
env:
ANDROID_KEYSTORE_BASE64: ${{ secrets.ANDROID_KEYSTORE_BASE64 }}
ANDROID_KEYSTORE_PASSWORD: ${{ secrets.ANDROID_KEYSTORE_PASSWORD }}
+6 -1
View File
@@ -284,8 +284,13 @@ tasks:
for attempt in 1 2 3; do
run_dagger "$@" && return 0
RC=$?
if [ "$attempt" -lt 3 ] && grep -qE "connection reset|context canceled|connection refused" "$DAGGER_OUT"; then
if [ "$attempt" -lt 3 ] && grep -qE "connection reset|context canceled|connection refused|invalid return status code" "$DAGGER_OUT"; then
echo "$(_ts) dagger: network error on attempt $attempt/3, retrying..." >&2
elif [ "$attempt" -lt 3 ] && grep -q "No space left on device" "$DAGGER_OUT"; then
echo "$(_ts) dagger: disk space error on attempt $attempt/3, pruning Dagger cache..." >&2
dagger query '{ engine { localCache { prune } } }' 2>/dev/null || true
echo "$(_ts) dagger: waiting 90s for freed space to settle..." >&2
sleep 90
else
return "$RC"
fi
+4
View File
@@ -6,6 +6,7 @@ import 'package:drift/drift.dart';
import 'package:drift/native.dart';
import 'package:enough_mail/enough_mail.dart' as imap;
import 'package:flutter/services.dart';
import 'package:flutter/widgets.dart';
import 'package:path/path.dart' as p;
import 'package:path_provider/path_provider.dart';
@@ -24,6 +25,9 @@ const _kResourceType = 'background_check';
@pragma('vm:entry-point')
void callbackDispatcher() {
// Required so that path_provider and other plugins are available in this
// background isolate (issue #192).
WidgetsFlutterBinding.ensureInitialized();
Workmanager().executeTask((_, __) async {
try {
await _doBackgroundSync();
+46 -1
View File
@@ -609,6 +609,17 @@ Future<String> _resolveDatabasePath() async {
await Future<void>.delayed(Duration(milliseconds: ms));
}
}
// On Android, path_provider can be permanently broken on some devices
// regardless of how long we wait (issue #192). Derive the path from
// /proc/self/cmdline (the Android process name == package name) without
// a platform channel as a last resort so the app can still open its DB.
if (Platform.isAndroid) {
final fallback = await _androidFallbackPath();
if (fallback != null) {
_dbPath = fallback;
return _dbPath!;
}
}
throw PlatformException(
code: 'channel-error',
message: 'path_provider unavailable after ${delays.length + 1} attempts — '
@@ -616,10 +627,44 @@ Future<String> _resolveDatabasePath() async {
);
}
// These two functions are only called from unit tests (database_path_test.dart).
// Reads /proc/self/cmdline to extract the Android package name, then
// constructs the standard app files-dir path without a platform channel.
// Returns null when the path cannot be determined or created.
Future<String?> _androidFallbackPath() async {
try {
final bytes = await File('/proc/self/cmdline').readAsBytes();
final end = bytes.indexOf(0);
final packageName = String.fromCharCodes(
end >= 0 ? bytes.sublist(0, end) : bytes,
).trim();
// A valid Android package name contains dots but not slashes.
if (packageName.isEmpty ||
!packageName.contains('.') ||
packageName.contains('/')) {
return null;
}
for (final base in [
'/data/user/0/$packageName/files',
'/data/data/$packageName/files',
]) {
try {
await Directory(base).create(recursive: true);
return p.join(base, 'sharedinbox.db');
} catch (_) {
continue;
}
}
return null;
} catch (_) {
return null;
}
}
// These functions are only called from unit tests (database_path_test.dart).
// They expose internals that cannot be reached via the public API.
Future<String> resolveDatabasePathForTesting() => _resolveDatabasePath();
void resetDatabasePathForTesting() => _dbPath = null;
Future<String?> androidFallbackPathForTesting() => _androidFallbackPath();
LazyDatabase _openConnection() {
return LazyDatabase(() async {
+35 -7
View File
@@ -14,14 +14,42 @@ if [ "$host" == "$port" ]; then
port="8774"
fi
echo "Probing $host:$port..."
if ! nc -zw 3 "$host" "$port" 2>/dev/null; then
echo "Error: No Dagger server responded on $host:$port"
exit 1
fi
echo "Found active Dagger server on $host:$port"
MAX_PROBE_ATTEMPTS=5
PROBE_DELAY=30
for attempt in $(seq 1 $MAX_PROBE_ATTEMPTS); do
echo "Probing $host:$port (attempt $attempt/$MAX_PROBE_ATTEMPTS)..."
if nc -zw 5 "$host" "$port" 2>/dev/null; then
echo "Found active server on $host:$port"
break
fi
if [ "$attempt" -eq "$MAX_PROBE_ATTEMPTS" ]; then
echo "Warning: No Dagger server responded on $host:$port after $MAX_PROBE_ATTEMPTS attempts"
echo "Remote engine unavailable — CI will use the local Dagger engine."
exit 0
fi
echo "Dagger server not responding, waiting ${PROBE_DELAY}s before retry..."
sleep $PROBE_DELAY
done
# 2. Setup TLS credentials (passed as env vars from secrets)
# 2a. Try plain TCP connection first (works when server is a plain TCP proxy, no TLS)
echo "Trying plain TCP Dagger connection at tcp://$host:$port..."
if _DAGGER_RUNNER_HOST="tcp://$host:$port" \
_EXPERIMENTAL_DAGGER_RUNNER_HOST="tcp://$host:$port" \
timeout 8 dagger version >/dev/null 2>&1; then
echo "Plain TCP Dagger connection succeeded — no TLS stunnel needed."
if [ -n "${GITHUB_ENV:-}" ]; then
echo "_EXPERIMENTAL_DAGGER_RUNNER_HOST=tcp://$host:$port" >> "$GITHUB_ENV"
echo "_DAGGER_RUNNER_HOST=tcp://$host:$port" >> "$GITHUB_ENV"
else
export _EXPERIMENTAL_DAGGER_RUNNER_HOST="tcp://$host:$port"
export _DAGGER_RUNNER_HOST="tcp://$host:$port"
echo "Dagger configured at tcp://$host:$port (plain TCP)"
fi
exit 0
fi
echo "Plain TCP connection not available; trying TLS stunnel..."
# 2b. Setup TLS credentials (passed as env vars from secrets)
mkdir -p /tmp/dagger-tls
echo "$DAGGER_CA_CERT" > /tmp/dagger-tls/ca.crt
echo "$DAGGER_CLIENT_CERT" > /tmp/dagger-tls/client.crt
+23
View File
@@ -1,4 +1,5 @@
import 'dart:async';
import 'dart:io';
import 'package:fake_async/fake_async.dart';
import 'package:flutter/services.dart';
@@ -129,5 +130,27 @@ void main() {
);
});
},
// The Android fallback runs only on Android, so on the host machine the
// exception is still thrown after all retries. Skip on Android to avoid
// depending on /data/user/0/... being absent in the test environment.
skip: Platform.isAndroid,
);
// Regression test for issue #192: _androidFallbackPath must return null when
// the process cmdline does not look like an Android package name (e.g. on
// the host test machine where the process is the Dart executable).
test(
'_androidFallbackPath returns null when process name is not a package name',
() async {
// On non-Android platforms the host process cmdline is a file-system path
// (starts with '/'), which the fallback correctly rejects. On Android
// the process IS named after the package — the fallback is free to
// succeed or return null depending on the device state; we do not assert
// here so as not to constrain Android behaviour.
if (!Platform.isAndroid) {
final result = await androidFallbackPathForTesting();
expect(result, isNull);
}
},
);
}