Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent constructing or setting attribute with invalid cell_val_num. #4952

Merged
merged 7 commits into from
Jun 6, 2024

Conversation

rroelke
Copy link
Contributor

@rroelke rroelke commented May 7, 2024

While working on the Rust API I observed that the following code would error when datatype == Datatype::Any:

            AttributeBuilder::new(context, field.name(), datatype)?
                .nullability(field.is_nullable())?
                .cell_val_num(cell_val_num)?

the error arises in the call to core function set_cell_val_num, which errors out if the datatype is ANY, even if you set to constants::var_num which is required for ANY.

From there I noticed that it is also possible to construct an Attribute with an invalid cell val num, e.g. construct an ANY attribute with a non-var cell val num.

This pull request adds the check_cell_val_num function and moves all associated error-checking logic into it, then calls this function from all of the Attribute constructors as well as set_cell_val_num.

[sc-46696]


TYPE: BUG
DESC: Prevent constructing attribute with invalid cell_val_num.

@ihnorton ihnorton requested a review from KiterLuc May 7, 2024 12:59
tiledb/sm/array_schema/test/unit_attribute.cc Outdated Show resolved Hide resolved
tiledb/sm/array_schema/test/unit_attribute.cc Outdated Show resolved Hide resolved
tiledb/sm/array_schema/test/unit_attribute.cc Outdated Show resolved Hide resolved
@KiterLuc KiterLuc changed the title Prevent constructing or setting attribute with invalid cell_val_num Prevent constructing or setting attribute with invalid cell_val_num. May 7, 2024
Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to call the validation function check_*. We have a number of such legacy functions, all of which are poorly documented, and none of which previously threw exceptions. We have two conventions:

  • ensure_*, where the required predicate is in the name
  • validate_*, where the required predicated is an invariant of a class.

This case is of the second kind, where we have an invariant on datatype+number_of_cells. The (preexisting) problem for this PR is that this invariant is inadequately documented in class Attribute.

Upshot:

  1. Call the function validate_*
  2. Add documentation to class Attribute describing the invariant being validated.

tiledb/sm/array_schema/attribute.cc Outdated Show resolved Hide resolved
tiledb/sm/array_schema/test/unit_attribute.cc Outdated Show resolved Hide resolved
tiledb/sm/array_schema/test/unit_attribute.cc Show resolved Hide resolved
tiledb/sm/array_schema/test/unit_attribute.cc Show resolved Hide resolved
tiledb/sm/array_schema/attribute.cc Outdated Show resolved Hide resolved
@rroelke
Copy link
Contributor Author

rroelke commented May 20, 2024

Obviously this is not urgent, but I have addressed the requested changes - I'd appreciate another round of review when the opportunity arises.

Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code modifications class Attribute are fine. Documentation needs work.

Tests need both more documentation and better internal structure. Sure, with some effort I can figure out what's being tested. The whole point of more documentation and better internal structure is that any future reader won't have to reverse engineer the tests to figure out how good the coverage is.

tiledb/sm/array_schema/test/unit_attribute.cc Outdated Show resolved Hide resolved
tiledb/sm/array_schema/attribute.cc Show resolved Hide resolved
tiledb/sm/array_schema/attribute.cc Outdated Show resolved Hide resolved
tiledb/sm/array_schema/attribute.h Outdated Show resolved Hide resolved
tiledb/sm/array_schema/test/CMakeLists.txt Show resolved Hide resolved
tiledb/sm/array_schema/test/unit_attribute.cc Outdated Show resolved Hide resolved
tiledb/sm/array_schema/test/unit_attribute.cc Outdated Show resolved Hide resolved
tiledb/sm/array_schema/test/unit_attribute.cc Outdated Show resolved Hide resolved
tiledb/sm/array_schema/test/unit_attribute.cc Outdated Show resolved Hide resolved
tiledb/sm/array_schema/test/unit_attribute.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[forgot to hit "Request changes" a moment ago]

Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, with the caveat that a few style nits could be improved.

The GENERATE copypasta isn't great, but it's two occurrences and not three or more, so it slides in under my threshold for insisting on common code. I would happy to have a common generator written in the scopy of this PR as well, though it's not necessary.
https://github.com/catchorg/Catch2/blob/devel/docs/generators.md#generator-interface

tiledb/sm/array_schema/test/unit_attribute.cc Show resolved Hide resolved
tiledb/sm/array_schema/attribute.h Outdated Show resolved Hide resolved
tiledb/sm/array_schema/attribute.h Outdated Show resolved Hide resolved
Attribute a(
"a", attribute_datatype, valid_cell_val_num, DataOrder::UNORDERED_DATA);
REQUIRE(a.cell_val_num() == valid_cell_val_num);
REQUIRE_THROWS(a.set_cell_val_num(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a small point, but the generator that governs this doesn't cover ANY.

I have been thinking for a while that we need some common test-support generators for each of the classes of basic datatype identifier with some shared property (here what cell_val_num they admit). It might be out of scope to write such things for this PR, but note that this generator is copypasta below. Regardless, their absence means that this DYNAMIC_SECTION might better go in a different test case with a different generator, albeit only that looks almost identical to the one here when directly specified rather than independently defined and separately documented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the generator that governs this doesn't cover ANY

This is intentional as all the ANY tests are covered by the separate test cases.

note that this generator is copypasta below

Unfortunately it actually is not, the second datatype generation does not include string types since they (well, STRING_ASCII at least) have some different semantics enumerated in a different TEST_CASE.

(that is something to improve here, is make sure that the non-ascii string types are properly covered by one of the generators)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is something to improve here, is make sure that the non-ascii string types are properly covered by one of the generators

I gave this a shot, they all return something along the lines of Datatype 'STRING_UTF32' is not a valid datatype for an ordered attribute. I don't think they are right to include in this test case but I'll leave a comment.

tiledb/sm/array_schema/test/unit_attribute.cc Show resolved Hide resolved
@KiterLuc KiterLuc merged commit 2593e54 into dev Jun 6, 2024
59 of 60 checks passed
@KiterLuc KiterLuc deleted the rr/sc-46696-Datatype-any-set-cell-var-num branch June 6, 2024 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants