GH-38007: [C++] Add VariableShapeTensor implementation#38008
GH-38007: [C++] Add VariableShapeTensor implementation#38008rok wants to merge 70 commits intoapache:mainfrom
Conversation
eec16ec to
6287402
Compare
dff5441 to
9c3b464
Compare
|
Hi @rok! It would be great to have this in 15.0.0 (added the milestone for visibility). Do you think you have enough bandwidth to work on it at the moment? |
|
@AlenkaF yes I'd very much like to continue working on this, I was waiting for the release before chasing reviewers :D |
pitrou
left a comment
There was a problem hiding this comment.
Some comments. I haven't looked at the tests too closely.
| const auto size = static_cast<int64_t>(permutation.size()); | ||
| std::vector<uint8_t> dim_seen(size, 0); | ||
| ARROW_EXPORT | ||
| Status IsPermutationValid(const std::vector<int64_t>& permutation); |
There was a problem hiding this comment.
These helper functions all could take std::span<const int64_t> as inputs, though not important either.
There was a problem hiding this comment.
Done. I hope std::span won't cause CRAN issues.
| std::vector<int64_t>* strides) { | ||
| auto fixed_width_type = internal::checked_pointer_cast<FixedWidthType>(value_type); | ||
| if (permutation.empty()) { | ||
| return internal::ComputeRowMajorStrides(*fixed_width_type.get(), shape, strides); |
There was a problem hiding this comment.
This one is doing the exact same thing except no permutation, please reconcile the two functions instead of duplicating code.
| return false; | ||
| } | ||
|
|
||
| auto is_permutation_trivial = [](const std::vector<int64_t>& permutation) { |
There was a problem hiding this comment.
Again, please don't duplicate code like this, factor out into common helpers.
| for (const auto& x : document["permutation"].GetArray()) { | ||
| permutation.emplace_back(x.GetInt64()); | ||
| } |
There was a problem hiding this comment.
What happens if "permutation" is not an array of ints?
| assert result.to_tensor().shape == (1, 3, 2, 2) | ||
| assert result.to_tensor().strides == (12 * bw, 1 * bw, 6 * bw, 2 * bw) | ||
|
|
||
| tensor_type = pa.fixed_shape_tensor(arrow_type, [2, 2, 3], permutation=[2, 1, 0]) |
There was a problem hiding this comment.
I suppose it's unrelated but it's testing something useful anyway?
There was a problem hiding this comment.
I forget what this was about. I think it might just be additional test of the same path, but called from fixed_shape_tensor_here.
| const auto start_position = data_array->offset() * byte_width; | ||
| const auto size = std::accumulate(shape.begin(), shape.end(), static_cast<int64_t>(1), | ||
| std::multiplies<>()); | ||
| ARROW_CHECK_EQ(size * byte_width, data_array->length() * byte_width); |
There was a problem hiding this comment.
This can be evidently simplified.
|
Thanks for the review @pitrou! I think I addressed all comments and some more things I found. Could you please do another pass so we can merge this while it's fresh in our minds? |
Rationale for this change
We want to add VariableShapeTensor extension type definition for arrays containing tensors with variable shapes.
What changes are included in this PR?
This adds a C++ implementation.
Are these changes tested?
Yes.
Are there any user-facing changes?
This adds a new extension type C++.