Skip to content

fix(breakpad): avoid stdio deadlock in macOS exception handler#1656

Merged
jpnurmi merged 3 commits intomasterfrom
jpnurmi/fix/signal-safe-log
Apr 17, 2026
Merged

fix(breakpad): avoid stdio deadlock in macOS exception handler#1656
jpnurmi merged 3 commits intomasterfrom
jpnurmi/fix/signal-safe-log

Conversation

@jpnurmi
Copy link
Copy Markdown
Collaborator

@jpnurmi jpnurmi commented Apr 16, 2026

On macOS, breakpad's Mach exception handler runs while all other threads are suspended by the kernel. Any vfprintf-based log call from the handler can deadlock waiting for a stdio file lock held by a frozen thread — this is what is currently hanging macOS 14 (xcode llvm) CI on #1650 after test_cache_max_items_with_retry[inproc].

Default enable_logging_when_crashed to off for the breakpad backend on macOS so the existing sentry__logger_disable() path in the callback fires by default. Users who accept the risk can still opt in via sentry_options_set_logger_enabled_when_crashed(opts, 1).

Share the async-signal-safe SENTRY_SIGNAL_SAFE_LOG macro via sentry_logger.h (previously private to the inproc backend) and convert the five explicit log sites in the breakpad callback plus the crash-safe flush helpers in sentry_metrics, sentry_logs, and sentry_batcher. These sites use write(STDERR_FILENO, ...) directly, so they stay visible during crash handling regardless of the logger state.

Captured stack from a local repro (8x CPU yes load, 4-iteration log cache-keep crash loop against a TCP-RST DSN):

Thread sentry-http (suspended):
    curl_easy_perform -> Curl_verboseconnect -> Curl_debug -> fwrite
        -> __sfvwrite -> __write_nocancel        (holds stderr flockfile)

Thread breakpad handler (blocked):
    breakpad_backend_callback -> sentry__logger_log
        -> sentry__logger_defaultlogger -> vfprintf -> flockfile
            -> __psynch_mutexwait                 (waits forever)

sentry_options_set_debug(true) (via the log example arg used in most cache tests) enables CURLOPT_VERBOSE, which is what reliably catches the worker mid-fwrite on stderr when the crash fires.

A related deadlock class still exists inside breakpad's own MinidumpGenerator (operator new -> mfm_alloc) when another thread is suspended mid-allocation. That is outside sentry-native's reach and not addressed here.

jpnurmi and others added 2 commits April 16, 2026 18:01
On macOS, breakpad uses an in-process Mach exception server: when the
process crashes, the kernel suspends all threads of the task and delivers
the exception to breakpad's handler thread. Any stdio lock (e.g. the
stderr file lock acquired by `flockfile` inside `vfprintf`/`fwrite`) held
by a thread that was suspended at crash time is never released — so any
`vfprintf`-based logging from the handler deadlocks the process.

A repro sample captured in CI shows the pattern:

  sentry-http (worker, suspended):
    curl_easy_perform -> Curl_verboseconnect -> Curl_debug -> fwrite
      -> __sfvwrite -> __write_nocancel    (holding stderr flockfile)
  breakpad exception handler (running):
    breakpad_backend_callback -> sentry__logger_log
      -> sentry__logger_defaultlogger -> vfprintf -> flockfile
        -> __psynch_mutexwait            (blocked forever)

The trigger in our tests is `sentry_options_set_debug(true)`, which
enables `CURLOPT_VERBOSE` and routes libcurl's verbose output to stderr
via `fwrite`. Under load (CI macos-14 runner, local repro under 8x CPU
stress) the worker is reliably caught mid-`fwrite` when the main thread
crashes.

Two changes address this:

1. Default `enable_logging_when_crashed` to off for the breakpad backend
   on macOS, so the callback's `sentry__logger_disable()` fires by
   default and no reachable `vfprintf` call is made. Users who accept
   the risk can still opt in by calling
   `sentry_options_set_logger_enabled_when_crashed(opts, 1)`.

2. Move the signal-safe log macro from `sentry_backend_inproc.c` into
   `sentry_logger.h` and convert the five explicit log sites in the
   breakpad callback, plus the crash-safe flush helpers in
   `sentry_metrics`, `sentry_logs`, and `sentry_batcher`, to use it.
   `SENTRY_SIGNAL_SAFE_LOG` writes a static string directly via
   `write(STDERR_FILENO, ...)`, bypassing the stdio lock entirely — so
   those lines remain visible during crash handling even when the
   regular logger is disabled.

A related class of deadlock still exists in breakpad's own
`MinidumpGenerator` (`operator new` → `mfm_alloc`) when another thread
is suspended mid-allocation; that is outside sentry-native's control
and not addressed here.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jpnurmi jpnurmi requested a review from supervacuus April 17, 2026 06:42
Copy link
Copy Markdown
Collaborator

@supervacuus supervacuus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor doc correction, but otherwise great. Thanks!

Comment thread src/sentry_options.c Outdated
The thread suspension happens in breakpad's minidump-writing path, not
via Mach exception delivery — our callback runs inside
WriteMinidumpWithException() while breakpad has the other threads held.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread src/sentry_options.c
@jpnurmi jpnurmi merged commit 6c7f08b into master Apr 17, 2026
59 checks passed
@jpnurmi jpnurmi deleted the jpnurmi/fix/signal-safe-log branch April 17, 2026 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants