Fix; target feature inline always#155426
Fix; target feature inline always#155426Jamesbarford wants to merge 4 commits intorust-lang:mainfrom
Conversation
|
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
|
r? @mejrs rustbot has assigned @mejrs. Use Why was this reviewer chosen?The reviewer was selected based on:
|
This comment has been minimized.
This comment has been minimized.
8b2613c to
ef34bce
Compare
This comment has been minimized.
This comment has been minimized.
ef34bce to
b269cc3
Compare
|
This is pretty far out of my expertise, so r? compiler |
| pub fn feature_could_influence_vector_length(&self, feature: &str) -> bool { | ||
| self.features_for_correct_fixed_length_vector_abi().iter().any(|(_, name)| *name == feature) | ||
| || self.features_for_correct_scalable_vector_abi() == Some(feature) | ||
| } |
There was a problem hiding this comment.
So the assumption is that if the caller has at least all the target features the callee has, and also they exactly agree on the target features listed here, then inlining is safe?
I am not sure if that's correct. For instance, the cx16 target feature on x86 is also not inlining-safe -- see llvm/llvm-project#187503.
I think the only safe thing we can do here is have a list of target features that we are certain are safe to inline.
There was a problem hiding this comment.
What you are basically doing here is duplicating the per-target InlineFeatureIgnoreList information in LLVM.
There was a problem hiding this comment.
I could be missing something here.
Using #[inline(always)] on that example would yield the following error;
warning: call to `#[inline(always)]`-annotated `load` requires the same target features to be inlined
--> ./cmpxchg.rs:13:14
|
13 | unsafe { load(x) }
| ^^^^^^^
|
= note: function will not be inlined
= note: the following target features are on `load` but missing from `load_core`: cmpxchg16b
note: `load` is defined here
--> ./cmpxchg.rs:6:1
|
6 | fn load(x: *const u128) -> u128 {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
= note: `#[warn(inline_always_mismatching_target_features)]` on by default
help: add `#[target_feature]` attribute to `load_core`
|
12 + #[target_feature(enable = "cmpxchg16b")]
13 | fn load_core(x: *const u128) -> u128 {
The callee is not a subset of the callers target features
There was a problem hiding this comment.
Though it looks like the load gets inlined irrespective of the attribute, I've tried with no attribute, #[inline] and #[inline(always)] with changes from this branch.
Perhaps if this error is emitted, to ensure that the function is not inlined we could place a noinline attribute at the callsite?
There was a problem hiding this comment.
f55f83e implements what I was thinking with noinline along with a test
There was a problem hiding this comment.
It seems like there's a failure condition if the caller has cx16 and the callee does not, if I understood the issue correctly? Cc @nikic
Generally any difference of target features between caller and callee is extremely suspicious for inlining in an LLVM-based compiler, and the features_for_correct_fixed_length_vector_abi/features_for_correct_scalable_vector_abi were never meant to be used for that check, so I am highly uncomfortable with using them like this.
There was a problem hiding this comment.
It seems like there's a failure condition if the caller has cx16 and the callee does not, if I understood the issue correctly?
To make sure I'm following, the inverse of the linked issue? If the caller has cx16 and the callee does not?
Something like;
#![feature(core_intrinsics)]
pub fn load(x: *const u128) -> u128 {
use std::intrinsics::{AtomicOrdering, atomic_load};
unsafe { atomic_load::<u128, { AtomicOrdering::Relaxed }>(x) }
}
#[unsafe(no_mangle)]
#[target_feature(enable = "cmpxchg16b")] // <- feature here
fn load_core(x: *const u128) -> u128 {
unsafe { load(x) }
}Now the inline'd load(x) will emit a cmpxchg16b as the caller has that target feature https://godbolt.org/z/xcfqveqKs
There was a problem hiding this comment.
Hm yeah there is something here that I do not understand. load_internal has more target features than its caller so why should that ever get inlined? I thought LLVM only ever inlines when the callee has fewer target features than the caller (and then the target features that are missing from the callee also can't be ABI-relevant).
There was a problem hiding this comment.
Apparently features on the InlineFeatureIgnoreList are entirely ignored by the inliner, i.e. the callee is even allowed to have more features than the caller. So that list cannot contain any features that unlock extra instructions. I didn't realize that such target features even exist.
|
Cc @nikic |
This comment has been minimized.
This comment has been minimized.
f55f83e to
2414698
Compare
| // an attribute and the caller and callee are compatible for | ||
| // inlining here. Otherwise we explicitly emit a `noinline` to | ||
| // ensure that the function will not get inlined through an LLVM | ||
| // pass. |
There was a problem hiding this comment.
To echo the problem @tmiasko raised in #145574, which I now finally understand: LLVM can move a call site to another function via inlining. So whatever reasoning we do here based on the attributes of the current caller is largely pointless since we don't know the attributes of the actual caller that this call may eventually end up in.
There was a problem hiding this comment.
Actually constructing a counterexample could be tricky because LLVM seems to inline alwaysinline functions first, so I have not managed to get it to move around an alwaysinline call site. But that seems like a fragile property to rely on.
The entire approach to target_feature_inline_always seems very unprincipled to me. We have no solid writeup of when alwaysinline is safe to put on a call site in LLVM IR, which means we don't even know the exact property rustc has to check before adding that attribute. This feature needs less "let's implement something and see if it works" (an approach that does not work for optimizations where the test suite is never even close to being able to find all bugs) and more "let's figure out a principled argument for why what we want to do could be correct".
There was a problem hiding this comment.
Right now, based on the most recent example by @tmiasko , I am not convinced that there even exists a sound way to use alwaysinline in LLVM IR for calls to functions with extra target features. Even the most conservative option where we require full feature equality seems to go wrong:
#[inline(never)]
#[target_feature(enable = "sse")]
pub fn i(x: &__m256) {
std::hint::black_box(x);
}
#[inline(always)]
#[target_feature(enable = "sse")]
pub fn f(x: &__m256) {
i(x);
}
#[target_feature(enable = "sse")]
pub fn g(x: &__m256) {
f(x) // alwaysinline call site
}
#[target_feature(enable = "sse", enable = "avx")]
pub fn h(x: &__m256) {
g(x)
}Imagine LLVM first inlines g into h (sound because none of the calls in g has a target-feature-dependent ABI). Then LLVM changes i to receive the argument by-value (sound because the only caller and callee have the same target features). We end up with:
#[inline(never)]
#[target_feature(enable = "sse")]
pub fn i(x: __m256) { // an actual by-value argument
std::hint::black_box(x);
}
#[target_feature(enable = "sse")]
pub fn f(x: &__m256) {
i(*x);
}
#[target_feature(enable = "sse", enable = "avx")]
pub fn h(x: &__m256) {
f(x) // alwaysinline call site
}Now we inline f:
#[inline(never)]
#[target_feature(enable = "sse")]
pub fn i(x: __m256) { // an actual by-value argument
std::hint::black_box(x);
}
#[target_feature(enable = "sse", enable = "avx")]
pub fn h(x: &__m256) {
i(*x)
}Now the caller and callee disagree on the ABI. Oopsie...
There was a problem hiding this comment.
I think I understand your point, but I may still be missing something.
I took your example, removed #[inline(always)], and compared the output here: https://godbolt.org/z/8xs8Gfjbj. In both cases, whether f has #[inline(always)] or not, h ends up directly calling i.
I also tried removing all pub declarations except on h: https://godbolt.org/z/onszcGPPc. That produces the same output both with and without #[inline(always)] on f.
So from these examples, it does not seem like #[inline(always)] is making the situation less safe. The underlying problem looks like an LLVM issue that exists regardless, rather than something introduced by the attribute itself.
We have no solid writeup of when alwaysinline is safe to put on a call site in LLVM IR
What kind of writeup would you want here? The current code comments are probably not enough, but I want to make sure I understand what is missing and where you would expect it to live.
The rule this PR is trying to encode is:
- The caller must not affect the callee's ABI.
- The callee's target features must be a subset of the caller's target features.
If both conditions hold, we apply alwaysinline at the call site. If either condition fails, we instead apply noinline, which prevents further inlining at that call site.
The goal of this PR is not to solve every inlining-related issue. It is to make problematic cases of the specific #[inline(always)] attribute usage easier to detect and safer to handle.
For example, in the cx16 issue (load_internal as the callee), condition 2 would fail if load_internal were marked #[inline(always)]. In that case, this PR would emit a warning and apply noinline at the call site, preventing further LLVM inlining there. That is safer than today's #[inline] behaviour, because we both surface the problem to the user and block additional inlining at that call site. At the same time, because alwaysinline is no longer attached to the function definition itself like inlinehint, valid uses of load_internal can still be inlined.
Likewise, in the example from the tracking issue, which is the basis for this test case. Condition 1 fails, because the caller affects the callee's ABI.
There was a problem hiding this comment.
I took your example, removed #[inline(always)], and compared the output here: https://godbolt.org/z/8xs8Gfjbj. In both cases, whether f has #[inline(always)] or not, h ends up directly calling i.
I also tried removing all pub declarations except on h: https://godbolt.org/z/onszcGPPc. That produces the same output both with and without #[inline(always)] on f.
Yes, LLVM today happens to do inlining in a different order that avoids the problem, at least for this particular example. But I would not want to bet the soundness of the language on LLVM always sticking to this inlining order.
That's why I described the expected order of applied optimizations in my example. You have to apply the optimizations manually to confirm (or reject) my reasoning. But as long as it is permitted for LLVM to do these optimizations in this order, that means the code we generate is unsound.
There was a problem hiding this comment.
What kind of writeup would you want here?
I would like to see an argument for why the rules you came up with are enough to guarantee that we avoid LLVM's soundness bugs around alwaysinline. That's the main goal here, after all: alwaysinline is fundamentally busted, but this feature is trying to use it soundly somehow. So far I am not convinced that is even possible. The example above shows, I think, that even with the very strict rule "caller and callee must have the exact same target features", alwaysinline is still unsound.
You cannot argue for the correctness of those rules by pointing at some examples. You can only make such an argument by saying: here's how inlining in LLVM works, and here is why under every possible inlining choice LLVM might make on any program, the result will be sound. Usually we leave that work to LLVM, but the entire premise of this approach is that we do the work ourselves because LLVM doesn't do it properly. That's fundamentally very hard (much harder than doing it in LLVM) as it requires reasoning "behind LLVM's back", so we need to be very careful and deliberate.
Fixes for
#[inline(always)]with target featurestests/ui/target-feature/inline-always-vector-abi-global-avx.rs).sse2,ssse3 ... etcis missing if the user wroteavxon a function and the inlining logic bails because of a mismatch.Addresses concerns in #145574