Implement native interleave for ListView#9558
Conversation
a1131f2 to
16f2287
Compare
|
Do you mind comparing this to the fallthrough performance of #9562 ? |
Oh for sure, thanks for reminding me! |
|
Updated the description with results now. It's not looking like a win..! |
|
I would say let's merge the fallthrough and iterate on this version. I'm sure there are several possibilities for optimizations. |
|
FWIW I pushed up the branch I've had marinating locally for a month or two in case it's helpful: main...polarsignals:arrow-rs:asubiotto/lvinterleave. I believe the benchmarks showed a slight regression for interleaves of small lists, but overall the perf was an improvement. I'm not able to take a closer look right now, but sharing in case it's helpful. |
Thank you! |
6e8412e to
b18c3a6
Compare
|
Updated implementation and results now! |
|
Sorry for dropping the ball on this! I think this is going in the right direction but when I pulled this in to try it out I realized that it doesn't work very well when interleaving listviews with a high number of shraed elements (i.e. offset/size windows are overlapping). I think we can get the best of both worlds by computing a heuristic: i.e. how many values are referenced vs how many values are in the backing array to figure out if we want to do per-row copies as this pr does or just a full concat of the backing slice which preserves overlapping encodings and can be much cheaper in the end. Here is a commit that implements that on top of this PR with a benchmark: polarsignals@7cb6880 There is a slight perf hit vs your branch to compute the heuristic (summing referenced sizes), but I think it's worth it in the grand scheme of things: |
Could you perhaps make a PR that adds this case as a benchmark? |
Ref #9558 (comment) --------- Co-authored-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com>
084b5f5 to
c9f789d
Compare
c9f789d to
216213b
Compare
ListViewsupport forinterleavekernel #9342.This PR adds a native implementation of interleave for the ListView type. Also adds a benchmark.
Performance improves by more than 30% in all cases in the benchmark:
list_view<i64>(0.1,0.1,20) 100list_view<i64>(0.1,0.1,20) 400list_view<i64>(0.1,0.1,20) 1024list_view<i64>(0.1,0.1,20) 1024 4-arrlist_view<i64>(0.0,0.0,20) 100list_view<i64>(0.0,0.0,20) 400list_view<i64>(0.0,0.0,20) 1024list_view<i64>(0.0,0.0,20) 1024 4-arr