Conversation
Latest stacker needs a newer cc than 1.2.16 as older versions don't have windows arm64ec support.
|
These commits modify the If this was unintentional then you should revert the changes before this PR is merged. |
|
rustbot has assigned @Mark-Simulacrum. Use Why was this reviewer chosen?The reviewer was selected based on:
|
|
Given that the previous attempt regressed performance: @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Update the cc crate for rustc_llvm
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (f61a781): comparison URL. Overall result: ❌ regressions - please read:Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf. Next, please: If you can, justify the regressions found in this try perf run in writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -0.0%, secondary 2.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary -2.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis perf run didn't have relevant results for this metric. Bootstrap: 490.301s -> 492.119s (0.37%) |
|
I think we need an explanation if not necessarily a solution (though those regressions look rather large, including on full builds). @rustbot author |
Update a bunch of dependencies to reduce windows-sys duplication This gets rid of windows-sys 0.60 and with the exception of curl and stacker it gets rid of windows-sys 0.59. For stacker getting rid of windows-sys 0.59 is blocked on rust-lang/stacker#145 and #155438.
|
The nearest to an explanation seems to be #146186 (comment), cc @madsmtm It would be a bit of a problem if we can never upgrade cc-rs again... |
|
I never got further than that investigation unfortunately |
Update a bunch of dependencies to reduce windows-sys duplication This gets rid of windows-sys 0.60 and with the exception of curl and stacker it gets rid of windows-sys 0.59. For stacker getting rid of windows-sys 0.59 is blocked on rust-lang/stacker#145 and rust-lang/rust#155438.
Update a bunch of dependencies to reduce windows-sys duplication This gets rid of windows-sys 0.60 and with the exception of curl and stacker it gets rid of windows-sys 0.59. For stacker getting rid of windows-sys 0.59 is blocked on rust-lang/stacker#145 and rust-lang/rust#155438.
Update a bunch of dependencies to reduce windows-sys duplication This gets rid of windows-sys 0.60 and with the exception of curl and stacker it gets rid of windows-sys 0.59. For stacker getting rid of windows-sys 0.59 is blocked on rust-lang/stacker#145 and rust-lang/rust#155438.
|
I've ran cachegrind for stm32f4-0.15.1 and I'm now fairly certain that #146186 (comment) was correct about jemalloc being the cause of the slowdown. Cachegrind shows a lot more jemalloc functions after this PR, suggesting less inlining. |
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Update the cc crate for rustc_llvm
This comment has been minimized.
This comment has been minimized.
|
Spurious CI failure. @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Update the cc crate for rustc_llvm
|
Oops, that was just a PR CI failure, not the try CI. |
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (cc1f7b3): comparison URL. Overall result: no relevant changes - no action neededBenchmarking means the PR may be perf-sensitive. Consider adding rollup=never if this change is not fit for rolling up. @rustbot label: -S-waiting-on-perf -perf-regression Instruction countThis perf run didn't have relevant results for this metric. Max RSS (memory usage)This perf run didn't have relevant results for this metric. CyclesResults (secondary -2.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis perf run didn't have relevant results for this metric. Bootstrap: 490.12s -> 490.282s (0.03%) |
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Update the cc crate for rustc_llvm
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (38101bc): comparison URL. Overall result: no relevant changes - no action neededBenchmarking means the PR may be perf-sensitive. Consider adding rollup=never if this change is not fit for rolling up. @rustbot label: -S-waiting-on-perf -perf-regression Instruction countThis perf run didn't have relevant results for this metric. Max RSS (memory usage)Results (secondary 0.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 2.6%, secondary 2.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.0%, secondary -0.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 487.131s -> 488.722s (0.33%) |
|
That bisects the regression to rust-lang/cc-rs@cc-v1.2.37...cc-v1.2.39. I'm pretty sure the regressing PR was rust-lang/cc-rs#1564 which causes LTO to only be used for C code when |
|
Thanks for reporting! We thought that translating Maybe you can explicitly pass |
|
FWIW, the jemalloc LTO is kind of annoying because it forces us to update the LLVM version by Rust and that of the host compiler in sync, otherwise we get this perf regression during LLVM updates. |
|
I think we are only doing LTO within jemalloc itself (or maybe between jemalloc and LLVM), not between jemalloc and rust code given that we don't build rustc with |
View all comments
Latest stacker needs a newer cc than 1.2.16 as older versions don't have windows arm64ec support.