Skip to content

Fix; target feature inline always#155426

Open
Jamesbarford wants to merge 4 commits intorust-lang:mainfrom
Jamesbarford:fix/target-feature-inline-always
Open

Fix; target feature inline always#155426
Jamesbarford wants to merge 4 commits intorust-lang:mainfrom
Jamesbarford:fix/target-feature-inline-always

Conversation

@Jamesbarford
Copy link
Copy Markdown
Contributor

Fixes for #[inline(always)] with target features

  • Checks if the target feature on the caller could affect the callee with an incompatible abi.
  • Checks if a globally enabled target feature makes what would otherwise not be inlined, inlined (see test; tests/ui/target-feature/inline-always-vector-abi-global-avx.rs).
  • If the callee enables any target feature the caller doesn't have, we bail.
  • If the caller enables target features the callee does not have and they do not affect the abi. We inline.
  • If the caller enables target features that affect the abi and the callee does not have those features, we bail.
  • Report errors to user both for caller mismatch and callee mismatch. But only explicitly what the user missed out, e.g we don't report that sse2,ssse3 ... etc is missing if the user wrote avx on a function and the inlining logic bails because of a mismatch.
  • Update tests to support these use cases.

Addresses concerns in #145574

@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Apr 17, 2026

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 17, 2026
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Apr 17, 2026

r? @mejrs

rustbot has assigned @mejrs.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

  • Owners of files modified in this PR: compiler
  • compiler expanded to 72 candidates
  • Random selection from 18 candidates

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@Jamesbarford Jamesbarford force-pushed the fix/target-feature-inline-always branch from ef34bce to b269cc3 Compare April 17, 2026 11:55
@mejrs
Copy link
Copy Markdown
Contributor

mejrs commented Apr 17, 2026

This is pretty far out of my expertise, so

r? compiler

@rustbot rustbot assigned dingxiangfei2009 and unassigned mejrs Apr 17, 2026
pub fn feature_could_influence_vector_length(&self, feature: &str) -> bool {
self.features_for_correct_fixed_length_vector_abi().iter().any(|(_, name)| *name == feature)
|| self.features_for_correct_scalable_vector_abi() == Some(feature)
}
Copy link
Copy Markdown
Member

@RalfJung RalfJung Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the assumption is that if the caller has at least all the target features the callee has, and also they exactly agree on the target features listed here, then inlining is safe?

I am not sure if that's correct. For instance, the cx16 target feature on x86 is also not inlining-safe -- see llvm/llvm-project#187503.

I think the only safe thing we can do here is have a list of target features that we are certain are safe to inline.

View changes since the review

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you are basically doing here is duplicating the per-target InlineFeatureIgnoreList information in LLVM.

Copy link
Copy Markdown
Contributor Author

@Jamesbarford Jamesbarford Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be missing something here.

Using #[inline(always)] on that example would yield the following error;

warning: call to `#[inline(always)]`-annotated `load` requires the same target features to be inlined
  --> ./cmpxchg.rs:13:14
   |
13 |     unsafe { load(x) }
   |              ^^^^^^^
   |
   = note: function will not be inlined
   = note: the following target features are on `load` but missing from `load_core`: cmpxchg16b
note: `load` is defined here
  --> ./cmpxchg.rs:6:1
   |
 6 | fn load(x: *const u128) -> u128 {
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   = note: `#[warn(inline_always_mismatching_target_features)]` on by default
help: add `#[target_feature]` attribute to `load_core`
   |
12 + #[target_feature(enable = "cmpxchg16b")]
13 | fn load_core(x: *const u128) -> u128 {

The callee is not a subset of the callers target features

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though it looks like the load gets inlined irrespective of the attribute, I've tried with no attribute, #[inline] and #[inline(always)] with changes from this branch.

Perhaps if this error is emitted, to ensure that the function is not inlined we could place a noinline attribute at the callsite?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f55f83e implements what I was thinking with noinline along with a test

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there's a failure condition if the caller has cx16 and the callee does not, if I understood the issue correctly? Cc @nikic

Generally any difference of target features between caller and callee is extremely suspicious for inlining in an LLVM-based compiler, and the features_for_correct_fixed_length_vector_abi/features_for_correct_scalable_vector_abi were never meant to be used for that check, so I am highly uncomfortable with using them like this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there's a failure condition if the caller has cx16 and the callee does not, if I understood the issue correctly?

To make sure I'm following, the inverse of the linked issue? If the caller has cx16 and the callee does not?

Something like;

#![feature(core_intrinsics)]

pub fn load(x: *const u128) -> u128 {
    use std::intrinsics::{AtomicOrdering, atomic_load};
    unsafe { atomic_load::<u128, { AtomicOrdering::Relaxed }>(x) }
}

#[unsafe(no_mangle)]
#[target_feature(enable = "cmpxchg16b")] // <- feature here
fn load_core(x: *const u128) -> u128 {
    unsafe { load(x) }
}

Now the inline'd load(x) will emit a cmpxchg16b as the caller has that target feature https://godbolt.org/z/xcfqveqKs

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yeah there is something here that I do not understand. load_internal has more target features than its caller so why should that ever get inlined? I thought LLVM only ever inlines when the callee has fewer target features than the caller (and then the target features that are missing from the callee also can't be ABI-relevant).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently features on the InlineFeatureIgnoreList are entirely ignored by the inliner, i.e. the callee is even allowed to have more features than the caller. So that list cannot contain any features that unlock extra instructions. I didn't realize that such target features even exist.

@RalfJung
Copy link
Copy Markdown
Member

Cc @nikic

@rust-log-analyzer

This comment has been minimized.

@Jamesbarford Jamesbarford force-pushed the fix/target-feature-inline-always branch from f55f83e to 2414698 Compare April 21, 2026 12:41
// an attribute and the caller and callee are compatible for
// inlining here. Otherwise we explicitly emit a `noinline` to
// ensure that the function will not get inlined through an LLVM
// pass.
Copy link
Copy Markdown
Member

@RalfJung RalfJung Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To echo the problem @tmiasko raised in #145574, which I now finally understand: LLVM can move a call site to another function via inlining. So whatever reasoning we do here based on the attributes of the current caller is largely pointless since we don't know the attributes of the actual caller that this call may eventually end up in.

View changes since the review

Copy link
Copy Markdown
Member

@RalfJung RalfJung Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually constructing a counterexample could be tricky because LLVM seems to inline alwaysinline functions first, so I have not managed to get it to move around an alwaysinline call site. But that seems like a fragile property to rely on.

The entire approach to target_feature_inline_always seems very unprincipled to me. We have no solid writeup of when alwaysinline is safe to put on a call site in LLVM IR, which means we don't even know the exact property rustc has to check before adding that attribute. This feature needs less "let's implement something and see if it works" (an approach that does not work for optimizations where the test suite is never even close to being able to find all bugs) and more "let's figure out a principled argument for why what we want to do could be correct".

Copy link
Copy Markdown
Member

@RalfJung RalfJung Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, based on the most recent example by @tmiasko , I am not convinced that there even exists a sound way to use alwaysinline in LLVM IR for calls to functions with extra target features. Even the most conservative option where we require full feature equality seems to go wrong:

#[inline(never)]
#[target_feature(enable = "sse")]
pub fn i(x: &__m256) {
  std::hint::black_box(x);
}

#[inline(always)]
#[target_feature(enable = "sse")]
pub fn f(x: &__m256) {
    i(x);
}

#[target_feature(enable = "sse")]
pub fn g(x: &__m256) {
    f(x) // alwaysinline call site
}

#[target_feature(enable = "sse", enable = "avx")]
pub fn h(x: &__m256) {
    g(x)
}

Imagine LLVM first inlines g into h (sound because none of the calls in g has a target-feature-dependent ABI). Then LLVM changes i to receive the argument by-value (sound because the only caller and callee have the same target features). We end up with:

#[inline(never)]
#[target_feature(enable = "sse")]
pub fn i(x: __m256) { // an actual by-value argument
  std::hint::black_box(x);
}

#[target_feature(enable = "sse")]
pub fn f(x: &__m256) {
    i(*x);
}

#[target_feature(enable = "sse", enable = "avx")]
pub fn h(x: &__m256) {
    f(x) // alwaysinline call site
}

Now we inline f:

#[inline(never)]
#[target_feature(enable = "sse")]
pub fn i(x: __m256) { // an actual by-value argument
  std::hint::black_box(x);
}

#[target_feature(enable = "sse", enable = "avx")]
pub fn h(x: &__m256) {
    i(*x)
}

Now the caller and callee disagree on the ABI. Oopsie...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand your point, but I may still be missing something.

I took your example, removed #[inline(always)], and compared the output here: https://godbolt.org/z/8xs8Gfjbj. In both cases, whether f has #[inline(always)] or not, h ends up directly calling i.

I also tried removing all pub declarations except on h: https://godbolt.org/z/onszcGPPc. That produces the same output both with and without #[inline(always)] on f.

So from these examples, it does not seem like #[inline(always)] is making the situation less safe. The underlying problem looks like an LLVM issue that exists regardless, rather than something introduced by the attribute itself.

We have no solid writeup of when alwaysinline is safe to put on a call site in LLVM IR

What kind of writeup would you want here? The current code comments are probably not enough, but I want to make sure I understand what is missing and where you would expect it to live.

The rule this PR is trying to encode is:

  • The caller must not affect the callee's ABI.
  • The callee's target features must be a subset of the caller's target features.

If both conditions hold, we apply alwaysinline at the call site. If either condition fails, we instead apply noinline, which prevents further inlining at that call site.

The goal of this PR is not to solve every inlining-related issue. It is to make problematic cases of the specific #[inline(always)] attribute usage easier to detect and safer to handle.

For example, in the cx16 issue (load_internal as the callee), condition 2 would fail if load_internal were marked #[inline(always)]. In that case, this PR would emit a warning and apply noinline at the call site, preventing further LLVM inlining there. That is safer than today's #[inline] behaviour, because we both surface the problem to the user and block additional inlining at that call site. At the same time, because alwaysinline is no longer attached to the function definition itself like inlinehint, valid uses of load_internal can still be inlined.

Likewise, in the example from the tracking issue, which is the basis for this test case. Condition 1 fails, because the caller affects the callee's ABI.

Copy link
Copy Markdown
Member

@RalfJung RalfJung Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took your example, removed #[inline(always)], and compared the output here: https://godbolt.org/z/8xs8Gfjbj. In both cases, whether f has #[inline(always)] or not, h ends up directly calling i.

I also tried removing all pub declarations except on h: https://godbolt.org/z/onszcGPPc. That produces the same output both with and without #[inline(always)] on f.

Yes, LLVM today happens to do inlining in a different order that avoids the problem, at least for this particular example. But I would not want to bet the soundness of the language on LLVM always sticking to this inlining order.

That's why I described the expected order of applied optimizations in my example. You have to apply the optimizations manually to confirm (or reject) my reasoning. But as long as it is permitted for LLVM to do these optimizations in this order, that means the code we generate is unsound.

Copy link
Copy Markdown
Member

@RalfJung RalfJung Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of writeup would you want here?

I would like to see an argument for why the rules you came up with are enough to guarantee that we avoid LLVM's soundness bugs around alwaysinline. That's the main goal here, after all: alwaysinline is fundamentally busted, but this feature is trying to use it soundly somehow. So far I am not convinced that is even possible. The example above shows, I think, that even with the very strict rule "caller and callee must have the exact same target features", alwaysinline is still unsound.

You cannot argue for the correctness of those rules by pointing at some examples. You can only make such an argument by saying: here's how inlining in LLVM works, and here is why under every possible inlining choice LLVM might make on any program, the result will be sound. Usually we leave that work to LLVM, but the entire premise of this approach is that we do the work ourselves because LLVM doesn't do it properly. That's fundamentally very hard (much harder than doing it in LLVM) as it requires reasoning "behind LLVM's back", so we need to be very careful and deliberate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants