A few fixes in the threadpool semaphore. Unify Windows/Unix implementation of LIFO policy. by VSadov · Pull Request #123921 · dotnet/runtime

VSadov · 2026-02-02T23:42:36Z

Changes:

Correctly handle Backoff.Exponential(0).

Embarrassing bug.
To get exponentially growing random spin count for an iteration we generate pseudorandom uint and do >> (32 - attempt). Since C# masks the shift operand with 31, when attempt==0 we end up not shifting at all, and the first iteration gets a large random spin count.
That caused many noisy results and interestingly some improvements (in scenarios that benefit from very long spins).

Unified implementation of LIFO policy with lightweight minimal implementation of LIFO waiting.

Once we are done spinning, we block threads and when workers are needed again wake them in LIFO order.

Unix WaitSubsystem is pretty heavy for these needs. It supports Interruptible waits, waiting on multiple objects, etc... None of that is interesting here. Most calls into the subsystem take a global process-wide lock which can contend under load with other uses, or a worker-waking threads may contend with the workers going to sleep, etc...

Windows used an opaque GetQueuedCompletionStatus for the side effect of releasing threads in LIFO order when completion is posted, with unknown overheads and interactions, even though typically it is more efficient than Unix WaitSubsystem.

The portable implementation seems to be faster than either of the platform-specific ones.
(measured by disabling spinning and running a few latency-sensitive benchmarks).

The portable implementation is also easier to reason about and to debug anomalies.

Adaptive spinning in the threadpool based on estimates of CPU core availability.

Spinning in threadpool is very tricky and spinning benefits differ greatly between scenarios. For some scenarios the longer the spin the better. But there are scenarios that benefit when the threadpool releases cores quickly once it sees no work. No preset fixed spin count is going to be good for everything.

Adaptive approach appears to be necessary to improve some scenarios without regressing many others.
We can further improve the heuristic, if there are more ideas.

dotnet-policy-service · 2026-02-02T23:43:28Z

Tagging subscribers to this area: @agocke, @VSadov
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

This PR addresses performance regressions in the threadpool semaphore (issue #123159) and unifies the Windows/Unix implementation of the LIFO (Last-In-First-Out) policy for threadpool worker thread management.

Changes:

Introduces a unified LowLevelThreadBlocker class that uses OS-provided compare-and-wait APIs (futex on Linux, WaitOnAddress on Windows) for efficient thread blocking, with a fallback to monitor-based implementation for other platforms
Refactors LowLevelLifoSemaphore to use the new blocker infrastructure, removes platform-specific Windows/Unix implementations, and improves spinning heuristics based on CPU availability
Adds native futex support for Linux through syscalls and Windows WaitOnAddress API interop

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
src/native/libs/System.Native/pal_threading.h	Adds declarations for Linux futex operations
src/native/libs/System.Native/pal_threading.c	Implements futex wait/wake operations for Linux using syscalls
src/native/libs/System.Native/entrypoints.c	Registers new futex entrypoints for Linux
src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs	Fixes spelling, removes spin count configuration, passes active thread count to semaphore
src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelThreadBlocker.cs	New class providing portable thread blocking using futex/WaitOnAddress or monitor fallback
src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.cs	Major refactoring to use LowLevelThreadBlocker, implements LIFO queue with pending signals, improves spin heuristics
src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.Windows.cs	Deleted - functionality moved to unified implementation
src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.Unix.cs	Deleted - functionality moved to unified implementation
src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelFutex.Windows.cs	New file providing Windows WaitOnAddress API wrapper
src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelFutex.Unix.cs	New file providing Linux futex wrapper
src/libraries/System.Private.CoreLib/src/System/Threading/Backoff.cs	Modified to return spin count and skip spinning on first attempt
src/libraries/System.Private.CoreLib/src/System.Private.CoreLib.Shared.projitems	Updates project to include new files and remove deleted platform-specific files
src/libraries/Common/src/Interop/Windows/Kernel32/Interop.WaitOnAddress.cs	New interop declarations for Windows WaitOnAddress and WakeByAddressSingle APIs
src/libraries/Common/src/Interop/Windows/Kernel32/Interop.CriticalSection.cs	Adds SuppressGCTransition attribute to LeaveCriticalSection
src/libraries/Common/src/Interop/Windows/Kernel32/Interop.ConditionVariable.cs	Adds SuppressGCTransition attribute to WakeConditionVariable
src/libraries/Common/src/Interop/Unix/System.Native/Interop.LowLevelMonitor.cs	Adds SuppressGCTransition attributes to Release and Signal_Release
src/libraries/Common/src/Interop/Unix/System.Native/Interop.Futex.cs	New interop declarations for Linux futex operations

src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelThreadBlocker.cs

src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.cs

src/native/libs/System.Native/pal_threading.c

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.cs

src/native/libs/System.Native/pal_threading.c

src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.cs

src/libraries/System.Private.CoreLib/src/System.Private.CoreLib.Shared.projitems

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.

src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.cs

src/native/libs/System.Native/pal_threading.c

src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.cs

VSadov · 2026-02-03T03:57:03Z

One test that was affected by #123159 is
System.Buffers.Tests.RentReturnArrayPoolTests<Byte>.ProducerConsumer

The test involved one thread renting an array, mutating it, passing to another thread via one-element buffer, the other thread would inspect the buffer and release, and so on.
In particular there is a scenario where both sides wait synchronously on async result of a buffer operation. There is an occasional race condition when IsCompleted on the buffer operation returns false, but subsequent OnCompleted sees a completed async operation. Since it can`t attach continuation to already completed result, it posts a workitem to the threadpool and attaches continuation to that. As a result, once in a while one of the threads that plays buffer ping-pong effectively waits on the completion of such workitem.

Since the scenario needs to wait on a task only occasionally, depending on environment (CPU speed, memory speed, ...), it varies how frequently the need arises for such task, but generally the test is sensitive to threadpool spinning long enough to execute a task without waking a thread.

The results after this PR, vs baseline:

=== Linux x64
(azure VM, so it is what it is, but the test has little of guest/host interactions - not IO-heavy at all)

baseline:

Method	RentalSize	ManipulateArray	Async	UseSharedPool	Mean	Error	StdDev	Median	Min	Max	Gen0	Allocated
ProducerConsumer	4096	False	False	False	1.787 us	0.0995 us	0.1146 us	1.800 us	1.5178 us	1.997 us	0.0050	84 B
ProducerConsumer	4096	False	False	True	1.836 us	0.1398 us	0.1610 us	1.859 us	1.3978 us	2.057 us	-	82 B
ProducerConsumer	4096	False	True	False	1.035 us	0.0642 us	0.0739 us	1.052 us	0.7293 us	1.074 us	-	-
ProducerConsumer	4096	False	True	True	1.095 us	0.0601 us	0.0692 us	1.122 us	0.8213 us	1.135 us	-	-
ProducerConsumer	4096	True	False	False	1.927 us	0.0647 us	0.0692 us	1.939 us	1.7744 us	2.025 us	-	6 B
ProducerConsumer	4096	True	False	True	1.952 us	0.0584 us	0.0672 us	1.952 us	1.7911 us	2.045 us	-	2 B
ProducerConsumer	4096	True	True	False	1.494 us	0.0366 us	0.0422 us	1.491 us	1.4217 us	1.575 us	-	-
ProducerConsumer	4096	True	True	True	1.875 us	0.0916 us	0.1055 us	1.879 us	1.6830 us	2.075 us	-	-

after the PR:
(mostly improvements, some scenarios really like long spins though....)

Method	RentalSize	ManipulateArray	Async	UseSharedPool	Mean	Error	StdDev	Median	Min	Max	Gen0	Allocated
ProducerConsumer	4096	False	False	False	1,706.1 ns	122.59 ns	141.18 ns	1,709.4 ns	1,486.7 ns	1,921.7 ns	0.0050	83 B
ProducerConsumer	4096	False	False	True	1,737.3 ns	134.77 ns	155.20 ns	1,723.8 ns	1,558.4 ns	2,107.2 ns	-	83 B
ProducerConsumer	4096	False	True	False	855.6 ns	25.55 ns	28.40 ns	860.4 ns	740.1 ns	869.3 ns	-	-
ProducerConsumer	4096	False	True	True	1,035.9 ns	21.40 ns	23.79 ns	1,038.6 ns	965.3 ns	1,065.8 ns	-	-
ProducerConsumer	4096	True	False	False	1,450.8 ns	53.79 ns	57.55 ns	1,446.4 ns	1,326.9 ns	1,572.2 ns	-	1 B
ProducerConsumer	4096	True	False	True	2,383.5 ns	261.28 ns	300.89 ns	2,273.3 ns	2,037.4 ns	3,040.7 ns	-	9 B
ProducerConsumer	4096	True	True	False	1,503.2 ns	57.55 ns	66.28 ns	1,512.2 ns	1,375.6 ns	1,584.2 ns	-	-
ProducerConsumer	4096	True	True	True	1,842.5 ns	68.79 ns	79.22 ns	1,837.4 ns	1,712.4 ns	2,035.9 ns	-	-

VSadov · 2026-02-03T06:46:44Z

Same tests on Windows:
(clearly an improvement)

BenchmarkDotNet v0.14.1-nightly.20250107.205, Windows 11 (10.0.26200.7623)
AMD Ryzen 9 7950X 4.50GHz, 1 CPU, 32 logical and 16 physical cores

=== baseline:

Method	RentalSize	ManipulateArray	Async	UseSharedPool	Mean	Error	StdDev	Median	Min	Max	Gen0	Allocated
ProducerConsumer	4096	False	False	False	552.4 ns	23.25 ns	25.84 ns	549.7 ns	514.5 ns	601.0 ns	0.0050	84 B
ProducerConsumer	4096	False	False	True	734.0 ns	26.71 ns	29.69 ns	730.6 ns	671.1 ns	795.4 ns	0.0025	83 B
ProducerConsumer	4096	False	True	False	325.9 ns	13.82 ns	14.19 ns	325.1 ns	304.5 ns	358.8 ns	-	-
ProducerConsumer	4096	False	True	True	367.9 ns	7.89 ns	9.09 ns	369.1 ns	332.3 ns	376.9 ns	-	-
ProducerConsumer	4096	True	False	False	1,390.5 ns	447.74 ns	515.62 ns	1,050.2 ns	959.7 ns	2,085.7 ns	-	58 B
ProducerConsumer	4096	True	False	True	1,380.9 ns	32.70 ns	37.65 ns	1,386.5 ns	1,286.9 ns	1,437.0 ns	-	32 B
ProducerConsumer	4096	True	True	False	886.0 ns	17.08 ns	18.28 ns	889.2 ns	852.0 ns	922.1 ns	-	-
ProducerConsumer	4096	True	True	True	1,012.1 ns	19.45 ns	18.20 ns	1,007.6 ns	979.7 ns	1,043.3 ns	-	-

=== this PR:

Method	RentalSize	ManipulateArray	Async	UseSharedPool	Mean	Error	StdDev	Median	Min	Max	Gen0	Allocated
ProducerConsumer	4096	False	False	False	395.7 ns	26.11 ns	25.64 ns	384.3 ns	373.3 ns	462.0 ns	0.0050	84 B
ProducerConsumer	4096	False	False	True	498.8 ns	73.72 ns	81.94 ns	480.5 ns	399.9 ns	667.2 ns	0.0050	83 B
ProducerConsumer	4096	False	True	False	249.4 ns	4.31 ns	3.37 ns	249.8 ns	243.8 ns	257.0 ns	-	-
ProducerConsumer	4096	False	True	True	321.5 ns	5.30 ns	4.42 ns	319.3 ns	317.4 ns	330.1 ns	-	-
ProducerConsumer	4096	True	False	False	960.5 ns	30.51 ns	35.14 ns	955.2 ns	914.2 ns	1,035.8 ns	-	5 B
ProducerConsumer	4096	True	False	True	1,255.1 ns	25.08 ns	24.63 ns	1,256.5 ns	1,209.8 ns	1,304.5 ns	-	30 B
ProducerConsumer	4096	True	True	False	883.0 ns	15.86 ns	14.83 ns	882.2 ns	863.3 ns	917.1 ns	-	-
ProducerConsumer	4096	True	True	True	1,056.4 ns	19.67 ns	18.40 ns	1,059.9 ns	1,014.1 ns	1,087.3 ns	-	-

VSadov · 2026-02-03T07:42:08Z

TE benchmarks seem to favor the change as well.

Unlike ProducerConsumer microbenchmark, TE does not like long threadpool spins, likely because there are non-threadpool threads like epoll threads.
This shows that threadpool heuristics can do the right adjustments.

Using command:

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/json.benchmarks.yml --scenario json    --profile aspnet-gold-lin  --application.framework net11.0 --application.options.outputFiles <. . .>

=== Baseline:

| First Request (ms)        | 172                 |
| Requests/sec              | 1,828,617           |
| Requests                  | 27,611,979          |
| Mean latency (ms)         | 0.14                |
| Max latency (ms)          | 12.27               |
| Bad responses             | 0                   |
| Socket errors             | 0                   |
| Read throughput (MB/s)    | 291.23              |
| Latency 50th (ms)         | 0.12                |
| Latency 75th (ms)         | 0.16                |
| Latency 90th (ms)         | 0.22                |
| Latency 99th (ms)         | 0.37                |

=== This PR:

| First Request (ms)        | 171                 |
| Requests/sec              | 1,846,521           |
| Requests                  | 27,882,744          |
| Mean latency (ms)         | 0.14                |
| Max latency (ms)          | 7.00                |
| Bad responses             | 0                   |
| Socket errors             | 0                   |
| Read throughput (MB/s)    | 294.08              |
| Latency 50th (ms)         | 0.12                |
| Latency 75th (ms)         | 0.16                |
| Latency 90th (ms)         | 0.22                |
| Latency 99th (ms)         | 0.37                |

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated no new comments.

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.cs

…evelThreadBlocker.cs Co-authored-by: Jan Kotas <jkotas@microsoft.com>

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated no new comments.

src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelThreadBlocker.cs

src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.cs

src/libraries/Common/src/Interop/Unix/System.Native/Interop.LowLevelMonitor.cs

VSadov · 2026-02-15T02:15:28Z

Thanks!!!

Copilot AI review requested due to automatic review settings February 2, 2026 23:42

VSadov added the area-System.Threading label Feb 2, 2026

dotnet-policy-service bot assigned VSadov Feb 2, 2026

Copilot started reviewing on behalf of VSadov February 2, 2026 23:43 View session

Copilot AI reviewed Feb 2, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings February 3, 2026 00:16

Copilot started reviewing on behalf of VSadov February 3, 2026 00:16 View session

Copilot AI reviewed Feb 3, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings February 3, 2026 01:35

Copilot started reviewing on behalf of VSadov February 3, 2026 01:35 View session

Copilot AI reviewed Feb 3, 2026

View reviewed changes

VSadov force-pushed the lifo branch from d31e21c to 11ff46e Compare February 3, 2026 06:04

VSadov marked this pull request as ready for review February 3, 2026 07:43

VSadov requested a review from stephentoub February 3, 2026 07:51

Copilot AI review requested due to automatic review settings February 3, 2026 21:23

Copilot started reviewing on behalf of VSadov February 3, 2026 21:24 View session

Copilot AI reviewed Feb 3, 2026

View reviewed changes

build-analysis bot mentioned this pull request Feb 4, 2026

Test failure: baseservices/exceptions/stackoverflow/stackoverflowtester/stackoverflowtester.cmd #110173

Open

VSadov force-pushed the lifo branch from f6af178 to f4d2258 Compare February 4, 2026 06:54

Copilot AI review requested due to automatic review settings February 4, 2026 08:59

Copilot started reviewing on behalf of VSadov February 4, 2026 09:00 View session

Copilot AI reviewed Feb 4, 2026

View reviewed changes

VSadov and others added 21 commits February 14, 2026 22:22

Typo: counted current thread as active twice.

dc1a4e6

limit acceptable values of ThreadPool_UnfairSemaphoreSpinLimit

afd874d

fix OSX and FreeBSD builds.

1630e9f

suppress unused-parameter in placeholders

a8c5ab9

suppress missing-noreturn in placeholders

62a3d0b

futex timeout is relative by default

e5b0f28

relaxed an assert.

8a52a38

fix Linux x86 build

b7c44fb

Added assert in ~LowLevelThreadBlocker() as in other similar finalizers.

8f70ad7

assert that _pendingSignals is not too large.

134381f

idempotent Dispose

73af91f

Update src/libraries/System.Private.CoreLib/src/System/Threading/LowL…

5853342

…evelThreadBlocker.cs Co-authored-by: Jan Kotas <jkotas@microsoft.com>

some more PR feedback

9b78e16

more feedback

1a74f00

PR feedback (Mincore)

4be373d

add Synchronization.lib to native libs

c49c135

Unused Usings

3994f31

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

avoid extremely unlikely int overflow

bbfae6d

Add Synchronization.lib to NAOT repro project.

793ec96

separate the new manually introduced entries in WindowsAPI.txt

dd61a6b

remove SuppressGCTransition on waking APIs

13d7a70

Copilot AI review requested due to automatic review settings February 14, 2026 22:22

VSadov force-pushed the lifo branch from 1752310 to 13d7a70 Compare February 14, 2026 22:22

Copilot started reviewing on behalf of VSadov February 14, 2026 22:23 View session

Copilot AI reviewed Feb 14, 2026

View reviewed changes

stephentoub approved these changes Feb 14, 2026

View reviewed changes

VSadov merged commit fbf6be5 into dotnet:main Feb 15, 2026
176 checks passed

VSadov deleted the lifo branch February 15, 2026 02:14

dotnet-maestro bot mentioned this pull request Feb 16, 2026

[main] Source code updates from dotnet/runtime dotnet/dotnet#4873

Merged

Conversation

VSadov commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service bot commented Feb 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VSadov commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VSadov commented Feb 3, 2026

Uh oh!

VSadov commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VSadov commented Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

VSadov commented Feb 2, 2026 •

edited

Loading

VSadov commented Feb 3, 2026 •

edited

Loading

VSadov commented Feb 3, 2026 •

edited

Loading