Skip to content

Improve performance on edges of GEMM for RISC-V#5674

Draft
ChipKerchner wants to merge 5 commits intoOpenMathLib:developfrom
ChipKerchner:fasterRVVEdges
Draft

Improve performance on edges of GEMM for RISC-V#5674
ChipKerchner wants to merge 5 commits intoOpenMathLib:developfrom
ChipKerchner:fasterRVVEdges

Conversation

@ChipKerchner
Copy link
Contributor

Improve performance on edges of GEMM for RISC-V - up to 9X faster.

image image

@ChipKerchner ChipKerchner marked this pull request as draft March 12, 2026 12:46
@ChipKerchner
Copy link
Contributor Author

Small values of M and/or N are useful for situations like AI inferencing.

@ChipKerchner
Copy link
Contributor Author

This increases the utilization to nearer peak numbers.

@ChipKerchner
Copy link
Contributor Author

@martin-frbg What's going on here?

TEST 126/128 fork:safety Attempt 10 timed out (retrying...) All 10 attempts failed, giving up.

@ChipKerchner
Copy link
Contributor Author

ChipKerchner commented Mar 12, 2026

This has nothing to do with this patch but could we limit the number of failures (or differences) to a reasonable number (like 200 or so)? It makes for some huge log files and makes looking for failures difficult.

2026-03-12T13:22:24.3299509Z CC 261.500000 DD 261.181854 2026-03-12T13:22:24.3328792Z CC 268.875000 DD 269.182281 2026-03-12T13:22:24.3329295Z CC 259.125000 DD 258.630676 2026-03-12T13:22:24.3330928Z CC 256.062500 DD 256.341339 2026-03-12T13:22:24.3332162Z CC 261.937500 DD 262.319489 2026-03-12T13:22:24.3373825Z SHGEMM FAILURES: 527559!!! 2026-03-12T13:23:13.9093248Z BGEMM FAILURES: 224

BTW, these are probably reasonable differences for SHGEMM - since they are only off in the last bit.

@martin-frbg
Copy link
Collaborator

@martin-frbg What's going on here?

TEST 126/128 fork:safety Attempt 10 timed out (retrying...) All 10 attempts failed, giving up.

If it's C910V, that's a (qemu) thread lockup in a forked process that is not reproducible on actual hardware so far

@ChipKerchner
Copy link
Contributor Author

Ok, let me report the QEMU issue to my team members.

@martin-frbg
Copy link
Collaborator

This has nothing to do with this patch but could we limit the number of failures (or differences) to a reasonable number (like 200 or so)? It makes for some huge log files and makes looking for failures difficult.

2026-03-12T13:22:24.3299509Z CC 261.500000 DD 261.181854 2026-03-12T13:22:24.3328792Z CC 268.875000 DD 269.182281 2026-03-12T13:22:24.3329295Z CC 259.125000 DD 258.630676 2026-03-12T13:22:24.3330928Z CC 256.062500 DD 256.341339 2026-03-12T13:22:24.3332162Z CC 261.937500 DD 262.319489 2026-03-12T13:22:24.3373825Z SHGEMM FAILURES: 527559!!! 2026-03-12T13:23:13.9093248Z BGEMM FAILURES: 224

BTW, these are probably reasonable differences for SHGEMM - since they are only off in the last bit.

That's probably why we had them printed at some point, to see if they're all reasonable (and the test criteria need to be adjusted) ?

@ChipKerchner
Copy link
Contributor Author

FYI - in the graphs above M and N are modulo the block size - in this case 16x8. But the biggest performance gains are for small M values.

@martin-frbg
Copy link
Collaborator

Ok, let me report the QEMU issue to my team members.

Unfortunately it's a bit complicated - stock qemu doesn't support C910V (to my knowledge) so CI is using some Xuantie fork of qemu9 that may or may not be actively maintained.

@ChipKerchner
Copy link
Contributor Author

Since C910V only supports RVV 0.7.1 - maybe this build shouldn't get kicked off on every check-in. RVV 1.0 has been ratified about 3 years ago. Unfortunately I guess until there is more silicon, maybe this is needed?

@martin-frbg
Copy link
Collaborator

Since C910V only supports RVV 0.7.1 - maybe this build shouldn't get kicked off on every check-in. RVV 1.0 has been ratified about 3 years ago. Unfortunately I guess until there is more silicon, maybe this is needed?

I wonder how many (other) early adopters bought a MilkV Pioneer or similar... by the time everybody has dumped them, we're probably looking at RVV1.5 or the like and the cycle begins anew. I guess I could look into disabling the utest in CI so that we get at least the build and BLAS tests

@ChipKerchner
Copy link
Contributor Author

The way I hear after RVA23 there won't be a major upgrade until RVA30 (?). There may be a minor one though. But there are plenty of optional extension to be used and are being ratified.

I think RVV 1.0 is the way to go. RVV 0.7.1 was mainly a hack.

@martin-frbg
Copy link
Collaborator

reminds me of Felix LeClerc's FOSDEM talk... we'll see how RISCV64 standards evolve but the earlier generation(s?) of miscreants probably won't go away soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants