Fix; target feature inline always by Jamesbarford · Pull Request #155426 · rust-lang/rust

Jamesbarford · 2026-04-17T10:00:22Z

Fixes for #[inline(always)] with target features

Checks if the target feature on the caller could affect the callee with an incompatible abi.
Checks if a globally enabled target feature makes what would otherwise not be inlined, inlined (see test; tests/ui/target-feature/inline-always-vector-abi-global-avx.rs).
If the callee enables any target feature the caller doesn't have, we bail.
If the caller enables target features the callee does not have and they do not affect the abi. We inline.
If the caller enables target features that affect the abi and the callee does not have those features, we bail.
Report errors to user both for caller mismatch and callee mismatch. But only explicitly what the user missed out, e.g we don't report that sse2,ssse3 ... etc is missing if the user wrote avx on a function and the inlining logic bails because of a mismatch.
Update tests to support these use cases.

Addresses concerns in #145574

rustbot · 2026-04-17T10:00:25Z

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

rustbot · 2026-04-17T10:00:28Z

r? @mejrs

rustbot has assigned @mejrs.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

Owners of files modified in this PR: compiler
compiler expanded to 72 candidates
Random selection from 18 candidates

mejrs · 2026-04-17T19:10:53Z

This is pretty far out of my expertise, so

r? compiler

RalfJung · 2026-04-20T13:43:11Z

+    pub fn feature_could_influence_vector_length(&self, feature: &str) -> bool {
+        self.features_for_correct_fixed_length_vector_abi().iter().any(|(_, name)| *name == feature)
+            || self.features_for_correct_scalable_vector_abi() == Some(feature)
+    }


So the assumption is that if the caller has at least all the target features the callee has, and also they exactly agree on the target features listed here, then inlining is safe?

I am not sure if that's correct. For instance, the cx16 target feature on x86 is also not inlining-safe -- see llvm/llvm-project#187503.

I think the only safe thing we can do here is have a list of target features that we are certain are safe to inline.

View changes since the review

What you are basically doing here is duplicating the per-target InlineFeatureIgnoreList information in LLVM.

I could be missing something here.

Using #[inline(always)] on that example would yield the following error;

warning: call to `#[inline(always)]`-annotated `load` requires the same target features to be inlined --> ./cmpxchg.rs:13:14 | 13 | unsafe { load(x) } | ^^^^^^^ | = note: function will not be inlined = note: the following target features are on `load` but missing from `load_core`: cmpxchg16b note: `load` is defined here --> ./cmpxchg.rs:6:1 | 6 | fn load(x: *const u128) -> u128 { | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ = note: `#[warn(inline_always_mismatching_target_features)]` on by default help: add `#[target_feature]` attribute to `load_core` | 12 + #[target_feature(enable = "cmpxchg16b")] 13 | fn load_core(x: *const u128) -> u128 {

The callee is not a subset of the callers target features

Though it looks like the load gets inlined irrespective of the attribute, I've tried with no attribute, #[inline] and #[inline(always)] with changes from this branch.

Perhaps if this error is emitted, to ensure that the function is not inlined we could place a noinline attribute at the callsite?

f55f83e implements what I was thinking with noinline along with a test

It seems like there's a failure condition if the caller has cx16 and the callee does not, if I understood the issue correctly? Cc @nikic

Generally any difference of target features between caller and callee is extremely suspicious for inlining in an LLVM-based compiler, and the features_for_correct_fixed_length_vector_abi/features_for_correct_scalable_vector_abi were never meant to be used for that check, so I am highly uncomfortable with using them like this.

It seems like there's a failure condition if the caller has cx16 and the callee does not, if I understood the issue correctly?

To make sure I'm following, the inverse of the linked issue? If the caller has cx16 and the callee does not?

Something like;

#![feature(core_intrinsics)] pub fn load(x: *const u128) -> u128 { use std::intrinsics::{AtomicOrdering, atomic_load}; unsafe { atomic_load::<u128, { AtomicOrdering::Relaxed }>(x) } } #[unsafe(no_mangle)] #[target_feature(enable = "cmpxchg16b")] // <- feature here fn load_core(x: *const u128) -> u128 { unsafe { load(x) } }

Now the inline'd load(x) will emit a cmpxchg16b as the caller has that target feature https://godbolt.org/z/xcfqveqKs

Hm yeah there is something here that I do not understand. load_internal has more target features than its caller so why should that ever get inlined? I thought LLVM only ever inlines when the callee has fewer target features than the caller (and then the target features that are missing from the callee also can't be ABI-relevant).

Apparently features on the InlineFeatureIgnoreList are entirely ignored by the inliner, i.e. the callee is even allowed to have more features than the caller. So that list cannot contain any features that unlock extra instructions. I didn't realize that such target features even exist.

RalfJung · 2026-04-20T13:44:07Z

Cc @nikic

…waysinline`

RalfJung · 2026-04-21T15:39:30Z

+            // an attribute and the caller and callee are compatible for
+            // inlining here. Otherwise we explicitly emit a `noinline` to
+            // ensure that the function will not get inlined through an LLVM
+            // pass.


To echo the problem @tmiasko raised in #145574, which I now finally understand: LLVM can move a call site to another function via inlining. So whatever reasoning we do here based on the attributes of the current caller is largely pointless since we don't know the attributes of the actual caller that this call may eventually end up in.

View changes since the review

Actually constructing a counterexample could be tricky because LLVM seems to inline alwaysinline functions first, so I have not managed to get it to move around an alwaysinline call site. But that seems like a fragile property to rely on.

The entire approach to target_feature_inline_always seems very unprincipled to me. We have no solid writeup of when alwaysinline is safe to put on a call site in LLVM IR, which means we don't even know the exact property rustc has to check before adding that attribute. This feature needs less "let's implement something and see if it works" (an approach that does not work for optimizations where the test suite is never even close to being able to find all bugs) and more "let's figure out a principled argument for why what we want to do could be correct".

Right now, based on the most recent example by @tmiasko , I am not convinced that there even exists a sound way to use alwaysinline in LLVM IR for calls to functions with extra target features. Even the most conservative option where we require full feature equality seems to go wrong:

#[inline(never)] #[target_feature(enable = "sse")] pub fn i(x: &__m256) { std::hint::black_box(x); } #[inline(always)] #[target_feature(enable = "sse")] pub fn f(x: &__m256) { i(x); } #[target_feature(enable = "sse")] pub fn g(x: &__m256) { f(x) // alwaysinline call site } #[target_feature(enable = "sse", enable = "avx")] pub fn h(x: &__m256) { g(x) }

Imagine LLVM first inlines g into h (sound because none of the calls in g has a target-feature-dependent ABI). Then LLVM changes i to receive the argument by-value (sound because the only caller and callee have the same target features). We end up with:

#[inline(never)] #[target_feature(enable = "sse")] pub fn i(x: __m256) { // an actual by-value argument std::hint::black_box(x); } #[target_feature(enable = "sse")] pub fn f(x: &__m256) { i(*x); } #[target_feature(enable = "sse", enable = "avx")] pub fn h(x: &__m256) { f(x) // alwaysinline call site }

Now we inline f:

#[inline(never)] #[target_feature(enable = "sse")] pub fn i(x: __m256) { // an actual by-value argument std::hint::black_box(x); } #[target_feature(enable = "sse", enable = "avx")] pub fn h(x: &__m256) { i(*x) }

Now the caller and callee disagree on the ABI. Oopsie...

I think I understand your point, but I may still be missing something.

I took your example, removed #[inline(always)], and compared the output here: https://godbolt.org/z/8xs8Gfjbj. In both cases, whether f has #[inline(always)] or not, h ends up directly calling i.

I also tried removing all pub declarations except on h: https://godbolt.org/z/onszcGPPc. That produces the same output both with and without #[inline(always)] on f.

So from these examples, it does not seem like #[inline(always)] is making the situation less safe. The underlying problem looks like an LLVM issue that exists regardless, rather than something introduced by the attribute itself.

We have no solid writeup of when alwaysinline is safe to put on a call site in LLVM IR

What kind of writeup would you want here? The current code comments are probably not enough, but I want to make sure I understand what is missing and where you would expect it to live.

The rule this PR is trying to encode is:

The caller must not affect the callee's ABI.

The callee's target features must be a subset of the caller's target features.

If both conditions hold, we apply alwaysinline at the call site. If either condition fails, we instead apply noinline, which prevents further inlining at that call site.

The goal of this PR is not to solve every inlining-related issue. It is to make problematic cases of the specific #[inline(always)] attribute usage easier to detect and safer to handle.

For example, in the cx16 issue (load_internal as the callee), condition 2 would fail if load_internal were marked #[inline(always)]. In that case, this PR would emit a warning and apply noinline at the call site, preventing further LLVM inlining there. That is safer than today's #[inline] behaviour, because we both surface the problem to the user and block additional inlining at that call site. At the same time, because alwaysinline is no longer attached to the function definition itself like inlinehint, valid uses of load_internal can still be inlined.

Likewise, in the example from the tracking issue, which is the basis for this test case. Condition 1 fails, because the caller affects the callee's ABI.

I took your example, removed #[inline(always)], and compared the output here: https://godbolt.org/z/8xs8Gfjbj. In both cases, whether f has #[inline(always)] or not, h ends up directly calling i.

I also tried removing all pub declarations except on h: https://godbolt.org/z/onszcGPPc. That produces the same output both with and without #[inline(always)] on f.

Yes, LLVM today happens to do inlining in a different order that avoids the problem, at least for this particular example. But I would not want to bet the soundness of the language on LLVM always sticking to this inlining order.

That's why I described the expected order of applied optimizations in my example. You have to apply the optimizations manually to confirm (or reject) my reasoning. But as long as it is permitted for LLVM to do these optimizations in this order, that means the code we generate is unsound.

What kind of writeup would you want here?

I would like to see an argument for why the rules you came up with are enough to guarantee that we avoid LLVM's soundness bugs around alwaysinline. That's the main goal here, after all: alwaysinline is fundamentally busted, but this feature is trying to use it soundly somehow. So far I am not convinced that is even possible. The example above shows, I think, that even with the very strict rule "caller and callee must have the exact same target features", alwaysinline is still unsound.

You cannot argue for the correctness of those rules by pointing at some examples. You can only make such an argument by saying: here's how inlining in LLVM works, and here is why under every possible inlining choice LLVM might make on any program, the result will be sound. Usually we leave that work to LLVM, but the entire premise of this approach is that we do the work ourselves because LLVM doesn't do it properly. That's fundamentally very hard (much harder than doing it in LLVM) as it requires reasoning "behind LLVM's back", so we need to be very careful and deliberate.

Jamesbarford added 2 commits April 17, 2026 10:47

Update logic for handling target feature inline always mismatches

58ce116

Update diagnostic messages

df654be

rustbot assigned mejrs Apr 17, 2026

This comment has been minimized.

Sign in to view

Jamesbarford force-pushed the fix/target-feature-inline-always branch from 8b2613c to ef34bce Compare April 17, 2026 10:24

Jamesbarford mentioned this pull request Apr 17, 2026

Tracking Issue for target_feature_inline_always #145574

Open

7 tasks

This comment has been minimized.

Sign in to view

Update and create new tests

b269cc3

Jamesbarford force-pushed the fix/target-feature-inline-always branch from ef34bce to b269cc3 Compare April 17, 2026 11:55

rustbot assigned dingxiangfei2009 and unassigned mejrs Apr 17, 2026

RalfJung reviewed Apr 20, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

Add a noinline attribute at the callsite when we bail on adding `al…

2414698

…waysinline`

Jamesbarford force-pushed the fix/target-feature-inline-always branch from f55f83e to 2414698 Compare April 21, 2026 12:41

RalfJung reviewed Apr 21, 2026

View reviewed changes

Uh oh!

Conversation

Jamesbarford commented Apr 17, 2026

Uh oh!

rustbot commented Apr 17, 2026

Uh oh!

rustbot commented Apr 17, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

mejrs commented Apr 17, 2026

Uh oh!

RalfJung Apr 20, 2026 • edited by rustbot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jamesbarford Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung commented Apr 20, 2026

Uh oh!

This comment has been minimized.

RalfJung Apr 21, 2026 • edited by rustbot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

RalfJung Apr 20, 2026 •

edited by rustbot

Loading

Jamesbarford Apr 21, 2026 •

edited

Loading

RalfJung Apr 21, 2026 •

edited by rustbot

Loading

RalfJung Apr 21, 2026 •

edited

Loading

RalfJung Apr 21, 2026 •

edited

Loading

RalfJung Apr 22, 2026 •

edited

Loading

RalfJung Apr 22, 2026 •

edited

Loading