Skip to content

Comments

GH-38007: [C++] Add VariableShapeTensor implementation#38008

Open
rok wants to merge 70 commits intoapache:mainfrom
rok:38007
Open

GH-38007: [C++] Add VariableShapeTensor implementation#38008
rok wants to merge 70 commits intoapache:mainfrom
rok:38007

Conversation

@rok
Copy link
Member

@rok rok commented Oct 4, 2023

Rationale for this change

We want to add VariableShapeTensor extension type definition for arrays containing tensors with variable shapes.

What changes are included in this PR?

This adds a C++ implementation.

Are these changes tested?

Yes.

Are there any user-facing changes?

This adds a new extension type C++.

@AlenkaF
Copy link
Member

AlenkaF commented Nov 16, 2023

Hi @rok! It would be great to have this in 15.0.0 (added the milestone for visibility). Do you think you have enough bandwidth to work on it at the moment?

@rok
Copy link
Member Author

rok commented Nov 16, 2023

@AlenkaF yes I'd very much like to continue working on this, I was waiting for the release before chasing reviewers :D
I'll rebase tonight and would very much appreciate reviews afterwards.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Nov 23, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Nov 28, 2023
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. I haven't looked at the tests too closely.

const auto size = static_cast<int64_t>(permutation.size());
std::vector<uint8_t> dim_seen(size, 0);
ARROW_EXPORT
Status IsPermutationValid(const std::vector<int64_t>& permutation);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These helper functions all could take std::span<const int64_t> as inputs, though not important either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I hope std::span won't cause CRAN issues.

std::vector<int64_t>* strides) {
auto fixed_width_type = internal::checked_pointer_cast<FixedWidthType>(value_type);
if (permutation.empty()) {
return internal::ComputeRowMajorStrides(*fixed_width_type.get(), shape, strides);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is doing the exact same thing except no permutation, please reconcile the two functions instead of duplicating code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I addressed this.

return false;
}

auto is_permutation_trivial = [](const std::vector<int64_t>& permutation) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, please don't duplicate code like this, factor out into common helpers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Factored out.

Comment on lines 170 to 172
for (const auto& x : document["permutation"].GetArray()) {
permutation.emplace_back(x.GetInt64());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if "permutation" is not an array of ints?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some checks.

assert result.to_tensor().shape == (1, 3, 2, 2)
assert result.to_tensor().strides == (12 * bw, 1 * bw, 6 * bw, 2 * bw)

tensor_type = pa.fixed_shape_tensor(arrow_type, [2, 2, 3], permutation=[2, 1, 0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it's unrelated but it's testing something useful anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget what this was about. I think it might just be additional test of the same path, but called from fixed_shape_tensor_here.

const auto start_position = data_array->offset() * byte_width;
const auto size = std::accumulate(shape.begin(), shape.end(), static_cast<int64_t>(1),
std::multiplies<>());
ARROW_CHECK_EQ(size * byte_width, data_array->length() * byte_width);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be evidently simplified.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! Done.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 20, 2026
@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Feb 20, 2026
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 20, 2026
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 20, 2026
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 20, 2026
@rok
Copy link
Member Author

rok commented Feb 21, 2026

Thanks for the review @pitrou! I think I addressed all comments and some more things I found. Could you please do another pass so we can merge this while it's fresh in our minds?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++][Python] Add VariableShapeTensor implementation

6 participants