-
Notifications
You must be signed in to change notification settings - Fork 887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement replace in pylibcudf #15005
Merged
rapids-bot
merged 7 commits into
rapidsai:branch-24.04
from
vyasr:feat/pylibcudf_replace
Feb 8, 2024
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
86b03c7
Generate cpdef enum
vyasr 4bab632
Implement replace in pylibcudf
vyasr f4a85fd
Use pylibcudf for everything except replace_nulls
vyasr fd1a1b3
Make replace_nulls work with a runtime ReplacePolicy and use it
vyasr 912c8e0
Merge remote-tracking branch 'upstream/branch-24.04' into feat/pylibc…
vyasr b3e9ac6
style
vyasr 7f39af0
Fix format
vyasr File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,7 @@ This page provides API documentation for pylibcudf. | |
reduce | ||
rolling | ||
scalar | ||
replace | ||
table | ||
types | ||
unary |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
======= | ||
replace | ||
======= | ||
|
||
.. automodule:: cudf._lib.pylibcudf.replace | ||
:members: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# Copyright (c) 2023-2024, NVIDIA CORPORATION. | ||
|
||
from libcpp cimport bool | ||
|
||
from cudf._lib.cpp.replace cimport replace_policy | ||
|
||
from .column cimport Column | ||
from .scalar cimport Scalar | ||
|
||
ctypedef fused ReplacementType: | ||
Column | ||
Scalar | ||
replace_policy | ||
# Allowing object is a workaround for | ||
# https://github.com/cython/cython/issues/5984. See the implementation of | ||
# replace_nulls for details. | ||
object | ||
|
||
|
||
cpdef Column replace_nulls(Column source_column, ReplacementType replacement) | ||
|
||
cpdef Column find_and_replace_all( | ||
Column source_column, | ||
Column values_to_replace, | ||
Column replacement_values, | ||
) | ||
|
||
cpdef Column clamp( | ||
Column source_column, | ||
Scalar lo, | ||
Scalar hi, | ||
Scalar lo_replace=*, | ||
Scalar hi_replace=*, | ||
) | ||
|
||
cpdef Column normalize_nans_and_zeros(Column source_column, bool inplace=*) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,208 @@ | ||
# Copyright (c) 2023-2024, NVIDIA CORPORATION. | ||
|
||
|
||
from cython.operator import dereference | ||
|
||
from libcpp cimport bool | ||
from libcpp.memory cimport unique_ptr | ||
from libcpp.utility cimport move | ||
|
||
from cudf._lib.cpp cimport replace as cpp_replace | ||
from cudf._lib.cpp.column.column cimport column | ||
|
||
from cudf._lib.cpp.replace import \ | ||
replace_policy as ReplacePolicy # no-cython-lint | ||
|
||
from .column cimport Column | ||
from .scalar cimport Scalar | ||
|
||
|
||
cpdef Column replace_nulls(Column source_column, ReplacementType replacement): | ||
"""Replace nulls in source_column. | ||
|
||
The values used to replace nulls depends on the type of replacement: | ||
- If replacement is a Column, the corresponding value from replacement | ||
is used. | ||
- If replacement is a Scalar, the same value is used for all nulls. | ||
- If replacement is a replace_policy, the policy is used to determine | ||
the replacement value: | ||
|
||
- PRECEDING: The first non-null value that precedes the null is used. | ||
- FOLLOWING: The first non-null value that follows the null is used. | ||
|
||
For more details, see :cpp:func:`replace_nulls`. | ||
|
||
Parameters | ||
---------- | ||
source_column : Column | ||
The column in which to replace nulls. | ||
replacement_column : Union[Column, Scalar, replace_policy] | ||
If a Column, the values to use as replacements. If a Scalar, the value | ||
to use as a replacement. If a replace_policy, the policy to use to | ||
determine the replacement value. | ||
|
||
Returns | ||
------- | ||
Column | ||
A copy of source_column with nulls replaced by values from | ||
replacement_column. | ||
""" | ||
cdef unique_ptr[column] c_result | ||
cdef replace_policy policy | ||
# Due to https://github.com/cython/cython/issues/5984, if this function is | ||
# called as a Python function (i.e. without typed inputs, which is always | ||
# true in pure Python files), the type of `replacement` will be `object` | ||
# instead of `replace_policy`. This is a workaround to handle that case. | ||
if ReplacementType is object: | ||
if isinstance(replacement, ReplacePolicy): | ||
policy = replacement | ||
with nogil: | ||
c_result = move( | ||
cpp_replace.replace_nulls(source_column.view(), policy) | ||
) | ||
return Column.from_libcudf(move(c_result)) | ||
else: | ||
raise TypeError("replacement must be a Column, Scalar, or replace_policy") | ||
|
||
with nogil: | ||
if ReplacementType is Column: | ||
c_result = move( | ||
cpp_replace.replace_nulls(source_column.view(), replacement.view()) | ||
) | ||
elif ReplacementType is Scalar: | ||
c_result = move( | ||
cpp_replace.replace_nulls( | ||
source_column.view(), dereference(replacement.c_obj) | ||
) | ||
) | ||
elif ReplacementType is replace_policy: | ||
c_result = move( | ||
cpp_replace.replace_nulls(source_column.view(), replacement) | ||
) | ||
else: | ||
assert False, "Internal error. Please contact pylibcudf developers" | ||
return Column.from_libcudf(move(c_result)) | ||
|
||
|
||
cpdef Column find_and_replace_all( | ||
Column source_column, | ||
Column values_to_replace, | ||
Column replacement_values, | ||
): | ||
"""Replace all occurrences of values_to_replace with replacement_values. | ||
|
||
For details, see :cpp:func:`find_and_replace_all`. | ||
|
||
Parameters | ||
---------- | ||
source_column : Column | ||
The column in which to replace values. | ||
values_to_replace : Column | ||
The column containing values to replace. | ||
replacement_values : Column | ||
The column containing replacement values. | ||
|
||
Returns | ||
------- | ||
Column | ||
A copy of source_column with all occurrences of values_to_replace | ||
replaced by replacement_values. | ||
""" | ||
cdef unique_ptr[column] c_result | ||
with nogil: | ||
c_result = move( | ||
cpp_replace.find_and_replace_all( | ||
source_column.view(), | ||
values_to_replace.view(), | ||
replacement_values.view(), | ||
) | ||
) | ||
return Column.from_libcudf(move(c_result)) | ||
|
||
|
||
cpdef Column clamp( | ||
Column source_column, | ||
Scalar lo, | ||
Scalar hi, | ||
Scalar lo_replace=None, | ||
Scalar hi_replace=None, | ||
): | ||
"""Clamp the values in source_column to the range [lo, hi]. | ||
|
||
For details, see :cpp:func:`clamp`. | ||
|
||
Parameters | ||
---------- | ||
source_column : Column | ||
The column to clamp. | ||
lo : Scalar | ||
The lower bound of the clamp range. | ||
hi : Scalar | ||
The upper bound of the clamp range. | ||
lo_replace : Scalar, optional | ||
The value to use for elements that are less than lo. If not specified, | ||
the value of lo is used. | ||
hi_replace : Scalar, optional | ||
The value to use for elements that are greater than hi. If not | ||
specified, the value of hi is used. | ||
|
||
Returns | ||
------- | ||
Column | ||
A copy of source_column with values clamped to the range [lo, hi]. | ||
""" | ||
if (lo_replace is None) != (hi_replace is None): | ||
raise ValueError("lo_replace and hi_replace must be specified together") | ||
|
||
cdef unique_ptr[column] c_result | ||
with nogil: | ||
if lo_replace is None: | ||
c_result = move( | ||
cpp_replace.clamp( | ||
source_column.view(), | ||
dereference(lo.c_obj), | ||
dereference(hi.c_obj), | ||
) | ||
) | ||
else: | ||
c_result = move( | ||
cpp_replace.clamp( | ||
source_column.view(), | ||
dereference(lo.c_obj), | ||
dereference(hi.c_obj), | ||
dereference(lo_replace.c_obj), | ||
dereference(hi_replace.c_obj), | ||
) | ||
) | ||
return Column.from_libcudf(move(c_result)) | ||
|
||
|
||
cpdef Column normalize_nans_and_zeros(Column source_column, bool inplace=False): | ||
"""Normalize NaNs and zeros in source_column. | ||
|
||
For details, see :cpp:func:`normalize_nans_and_zeros`. | ||
|
||
Parameters | ||
---------- | ||
source_column : Column | ||
The column to normalize. | ||
inplace : bool, optional | ||
If True, normalize source_column in place. If False, return a new | ||
column with the normalized values. | ||
|
||
Returns | ||
------- | ||
Column | ||
A copy of source_column with NaNs and zeros normalized. | ||
""" | ||
cdef unique_ptr[column] c_result | ||
with nogil: | ||
if inplace: | ||
cpp_replace.normalize_nans_and_zeros(source_column.mutable_view()) | ||
else: | ||
c_result = move( | ||
cpp_replace.normalize_nans_and_zeros(source_column.view()) | ||
) | ||
|
||
if not inplace: | ||
return Column.from_libcudf(move(c_result)) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this
=*
syntax mean? This doesn't seem like anything I've seen in Python, C++, or Cython before.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the standard Cython syntax for default arguments in pxd files. The actual default value must be specified in the implementation (the pyx file) while the declaration just indicates that a default exists so that callers know what valid invocations look like. We have this in a couple of places in our existing Cython like column.pxd and scalar.pxd