Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate sqllogictest for String, LargeString and StringView behavior #12525

Merged
merged 4 commits into from
Sep 20, 2024

Conversation

goldmedal
Copy link
Contributor

Which issue does this PR close?

Part of #12415

Rationale for this change

I am implementing a testing framework for string types to ensure consistent behavior across different string types. I have extracted test cases from datafusion/sqllogictest/test_files/string_view.slt, which contains many good tests for string types.

Currently, I haven’t moved all the tests from string_view.slt because I don’t want to make this PR too large (I want to confirm if I’m on the right way 🤔). For now, I’ve only moved the following case:

  • column comparison as filters
  • column comparison
  • column to StringView scalar comparison
  • column to String scalar
  • column to LargeString scalar
  • substr function
  • distinct aggregate
  • STARTS_WITH function
  • TRANSLATE function
  • REGEXP_REPLACE function
  • Initcap function

Some TODO items should be finished in the follow-up PR

  • The remaining tests in the string_view.slt
    • LIKE/ILIKE
    • SUBSTR planing
    • ASCII
    • BTRIM
    • LTRIM
    • RTRIM
    • CHARACTER_LENGTH
    • CONCAT
    • CONTAINS
    • ENDS_WITH
    • LEVENSHTEIN
    • LOWER
    • UPPER
    • LPAD
    • OCTET_LENGTH
    • OVERLAY
    • REGEXP_LIKE
    • REGEXP_MATCH
    • REPEAT
    • REPLACE
    • REVERSE
    • RIGHT
    • LEFT
    • RPAD
    • SPLIT_PART
    • STRPOS
    • SUBSTRINDEX
    • FIND_IN_SET
    • || operator
    • ~ operator (regex match)
    • ~* operator (regex match case-insensitive)
    • !~~ operator (not like match)
    • !~~* operator (not like match case-insensitive)
    • Move StringView specific tests to string/string_view.slt
  • Move string literal tests from function.slt to string/string_literal.rs

What changes are included in this PR?

Refer to #12433, I made some changes. I regroup the test case by the data type. The file structure is

string/
    - init_data.slt.part        // generate the testing data
    - string_query.slt.part     // the sharing tests for all string type
    - string.slt                // the entrypoint for string type
    - large_string.slt          // the entrypoint for large_string type
    - string_view.slt           // the entrypoint for string_view type and the string_view specific tests
    - string_literal.slt        // the tests for string literal

Any entry point file should include init_data.slt.part and string_query.slt.part.

For planning tests for StringView (e.g. EXPLAIN ...), I moved to the string/string_view.slt because it's only used to assert the planning behavior of StringView.

Are these changes tested?

Refactor and enhance the original tests.

Are there any user-facing changes?

no

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Sep 18, 2024
@goldmedal goldmedal changed the title Feature/12415 string test Automate sqllogictest for String, LargeString and StringView behavior Sep 18, 2024
Comment on lines +19 to +23
create table test_source as values
('Andrew', 'X', 'datafusion📊🔥', '🔥'),
('Xiangpeng', 'Xiangpeng', 'datafusion数据融合', 'datafusion数据融合'),
('Raphael', 'R', 'datafusionДатаФусион', 'аФус'),
(NULL, 'R', NULL, '🔥');
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we started to implement some ASCII-faster paths, I think we should always consider testing both ASCII-only and Unicode cases.

@goldmedal goldmedal marked this pull request as ready for review September 18, 2024 16:25
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @goldmedal

I really like this structured, and I think the tests are super well commented and easy to follow. Really nice ❤️ 🏆

@2010YOUY01 do you perhaps have time to take a look at this PR as well?

cc @tshauck

drop table test_substr_base;


# --------------------------------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is well organized 👍🏼

I have one minor suggestion: we should add a separate README under test_files/string/ to explain the structure

@goldmedal
Copy link
Contributor Author

This is well organized 👍🏼

I have one minor suggestion: we should add a separate README under test_files/string/ to explain the structure

Nice idea! I have added the README for them. Thanks for the suggestion 🙇

@alamb
Copy link
Contributor

alamb commented Sep 20, 2024

THANK YOU SO MUCH @goldmedal and @2010YOUY01 -- this looks really great

@alamb alamb merged commit b412436 into apache:main Sep 20, 2024
24 checks passed
@goldmedal goldmedal deleted the feature/12415-string-test branch September 21, 2024 05:30
@goldmedal
Copy link
Contributor Author

Thanks @alamb @2010YOUY01 🙇‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants