Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plumb pylibcudf strings contains_re through cudf_polars #15918

Merged
Merged
Changes from 1 commit
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
05d1eb4
initial
brandon-b-miller May 28, 2024
004ae1d
Merge branch 'branch-24.08' into pylibcudf-strings-contains
brandon-b-miller May 28, 2024
9a3f19d
merge/resolve
brandon-b-miller May 29, 2024
87341d4
one test
brandon-b-miller May 29, 2024
6d50191
tests, fixes
brandon-b-miller May 29, 2024
e35cd9a
declaration
brandon-b-miller May 29, 2024
69ad703
Merge branch 'branch-24.08' into pylibcudf-strings-contains
brandon-b-miller May 30, 2024
83178c9
docs, style
brandon-b-miller May 31, 2024
758755c
type create more strongly
brandon-b-miller May 31, 2024
98aeefa
add more tests
brandon-b-miller May 31, 2024
936e412
style
brandon-b-miller May 31, 2024
b15588a
regex program tests
brandon-b-miller May 31, 2024
b5a68c5
Merge branch 'branch-24.08' into pylibcudf-strings-contains
brandon-b-miller Jun 3, 2024
4b6a393
polars contains_re plumbing
brandon-b-miller Jun 4, 2024
6c125cb
refactor expr
brandon-b-miller Jun 5, 2024
9fb3a2b
add tests for invalid regex
brandon-b-miller Jun 5, 2024
0463688
merge latest/resolve conflicts
brandon-b-miller Jun 6, 2024
7543726
cleanup
brandon-b-miller Jun 6, 2024
42b158f
Address reviews
brandon-b-miller Jun 6, 2024
39b57ca
merge latest/resolve conflicts
brandon-b-miller Jun 10, 2024
e3fb170
refactor logic
brandon-b-miller Jun 10, 2024
da08309
merge latest/resolve
brandon-b-miller Jun 12, 2024
e45fbed
add literal column tests, support it, refactor logic
brandon-b-miller Jun 12, 2024
22e1031
add tests, refactor
brandon-b-miller Jun 12, 2024
4b643a7
pacify mypy
brandon-b-miller Jun 12, 2024
5533e5b
Make type-narrowing a no-op if run with `-O`
wence- Jun 13, 2024
ee42757
Merge branch 'branch-24.08' into cudf-polars-str-contains
wence- Jun 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 26 additions & 23 deletions python/cudf_polars/cudf_polars/dsl/expr.py
Original file line number Diff line number Diff line change
Expand Up @@ -667,36 +667,39 @@ def do_evaluate(
mapping: Mapping[Expr, Column] | None = None,
) -> Column:
"""Evaluate this expression given a dataframe for context."""
columns = [
child.evaluate(df, context=context, mapping=mapping)
for child in self.children
]
if self.name == pl_expr.StringFunction.Lowercase:
(column,) = columns
return Column(plc.strings.case.to_lower(column.obj))
elif self.name == pl_expr.StringFunction.Uppercase:
(column,) = columns
return Column(plc.strings.case.to_upper(column.obj))
elif self.name == pl_expr.StringFunction.EndsWith:
column, suffix = columns
return Column(plc.strings.find.ends_with(column.obj, suffix.obj))
elif self.name == pl_expr.StringFunction.StartsWith:
column, suffix = columns
return Column(plc.strings.find.starts_with(column.obj, suffix.obj))
elif self.name == pl_expr.StringFunction.Contains:
column, pattern = columns
if self.name == pl_expr.StringFunction.Contains:
child, pattern = self.children
column = child.evaluate(df, context=context, mapping=mapping)
assert isinstance(pattern, Literal)

literal, _ = self.options
if literal:
return Column(plc.strings.find.contains(column.obj, pattern.obj))
return Column(plc.strings.find.contains(column.obj, pattern.value))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for literal contains, can we support columns as patterns? libcudf does: https://docs.rapids.ai/api/libcudf/stable/group__strings__find#ga9acbc587765007c5e35811c07e990b03

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supported this in e45fbed

else:
# TODO: hack
pattern = plc.interop.to_arrow(pattern.obj).as_py()
prog = plc.strings.regex_program.RegexProgram.create(
pattern, flags=plc.strings.regex_flags.RegexFlags.DEFAULT
pattern.value.as_py(),
flags=plc.strings.regex_flags.RegexFlags.DEFAULT,
)
return Column(plc.strings.contains.contains_re(column.obj, prog))
else:
raise NotImplementedError(f"StringFunction {self.name}")
columns = [
child.evaluate(df, context=context, mapping=mapping)
for child in self.children
]
if self.name == pl_expr.StringFunction.Lowercase:
(column,) = columns
return Column(plc.strings.case.to_lower(column.obj))
elif self.name == pl_expr.StringFunction.Uppercase:
(column,) = columns
return Column(plc.strings.case.to_upper(column.obj))
elif self.name == pl_expr.StringFunction.EndsWith:
column, suffix = columns
return Column(plc.strings.find.ends_with(column.obj, suffix.obj))
elif self.name == pl_expr.StringFunction.StartsWith:
column, suffix = columns
return Column(plc.strings.find.starts_with(column.obj, suffix.obj))
else:
raise NotImplementedError(f"StringFunction {self.name}")


class Sort(Expr):
Expand Down
Loading