Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add transpose API to pylibcudf #16749

Merged
merged 15 commits into from
Sep 25, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/cudf/source/user_guide/api_docs/pylibcudf/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ This page provides API documentation for pylibcudf.
table
traits
transform
transpose
types
unary

Expand Down
6 changes: 6 additions & 0 deletions docs/cudf/source/user_guide/api_docs/pylibcudf/transpose.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
=========
transpose
=========

.. automodule:: pylibcudf.transpose
:members:
27 changes: 5 additions & 22 deletions python/cudf/cudf/_lib/transpose.pyx
Original file line number Diff line number Diff line change
@@ -1,32 +1,15 @@
# Copyright (c) 2020-2024, NVIDIA CORPORATION.

from libcpp.memory cimport unique_ptr
from libcpp.pair cimport pair
from libcpp.utility cimport move

from pylibcudf.libcudf.column.column cimport column
from pylibcudf.libcudf.table.table_view cimport table_view
from pylibcudf.libcudf.transpose cimport transpose as cpp_transpose
import pylibcudf as plc

from cudf._lib.column cimport Column
from cudf._lib.utils cimport columns_from_table_view, table_view_from_columns


def transpose(list source_columns):
"""Transpose m n-row columns into n m-row columns
"""
cdef pair[unique_ptr[column], table_view] c_result
cdef table_view c_input = table_view_from_columns(source_columns)

with nogil:
c_result = move(cpp_transpose(c_input))

# Notice, the data pointer of `result_owner` has been exposed
# through `c_result.second` at this point.
result_owner = Column.from_unique_ptr(
move(c_result.first), data_ptr_exposed=True
)
return columns_from_table_view(
c_result.second,
owners=[result_owner] * c_result.second.num_columns()
input_table = plc.table.Table(
[col.to_pylibcudf(mode="read") for col in source_columns]
)
_, result_table = plc.transpose.transpose(input_table)
return [Column.from_pylibcudf(col) for col in result_table.columns()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madsbk: can you remind me what it means that the result_owner is exposed through the table (c_result.second).

Is it that we have, now, two Buffers that point to the same data, and therefore if we were to spill one, we would need to spill the other?

I think this is right, and so yes, I think we do need (@mroeschke) to have a way of marking a column's data as exposed when we import it from pylibcudf.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it that we have, now, two Buffers that point to the same data, and therefore if we were to spill one, we would need to spill the other?

Yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is right, and so yes, I think we do need (@mroeschke) to have a way of marking a column's data as exposed when we import it from pylibcudf.

There is a data_ptr_exposed keyword in from_pylibcudf that currently isn't implemented. I think we need to pass that parameter through to the exposed keyword in as_buffer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sounds right

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this was addressed in #16760

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it looks like the necessary parameter was handled there so this PR should be safe to merge now.

1 change: 1 addition & 0 deletions python/pylibcudf/pylibcudf/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ set(cython_sources
table.pyx
traits.pyx
transform.pyx
transpose.pyx
types.pyx
unary.pyx
utils.pyx
Expand Down
2 changes: 2 additions & 0 deletions python/pylibcudf/pylibcudf/__init__.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ from . cimport (
strings,
traits,
transform,
transpose,
types,
unary,
)
Expand Down Expand Up @@ -71,6 +72,7 @@ __all__ = [
"sorting",
"traits",
"transform",
"transpose",
"types",
"unary",
]
2 changes: 2 additions & 0 deletions python/pylibcudf/pylibcudf/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
strings,
traits,
transform,
transpose,
types,
unary,
)
Expand Down Expand Up @@ -83,6 +84,7 @@
"sorting",
"traits",
"transform",
"transpose",
"types",
"unary",
]
27 changes: 27 additions & 0 deletions python/pylibcudf/pylibcudf/tests/test_transpose.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Copyright (c) 2024, NVIDIA CORPORATION.

import pyarrow as pa
import pylibcudf as plc
import pytest


@pytest.mark.parametrize(
"arr",
[
[],
[1, 2, 3],
[1, 2],
[1],
],
)
def test_transpose(arr):
data = {"a": arr, "b": arr}
arrow_tbl = pa.table(data)
plc_tbl = plc.interop.from_arrow(arrow_tbl)
_, plc_result = plc.transpose.transpose(plc_tbl)
arrow_result = plc.interop.to_arrow(plc_result)
if len(arr) == 0:
expected = (0, 0)
else:
expected = (len(data), len(arr))
assert arrow_result.shape == expected
Matt711 marked this conversation as resolved.
Show resolved Hide resolved
5 changes: 5 additions & 0 deletions python/pylibcudf/pylibcudf/transpose.pxd
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Copyright (c) 2024, NVIDIA CORPORATION.
from .table cimport Table


cpdef tuple transpose(Table input_table)
39 changes: 39 additions & 0 deletions python/pylibcudf/pylibcudf/transpose.pyx
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Copyright (c) 2024, NVIDIA CORPORATION.
from libcpp.memory cimport unique_ptr
from libcpp.pair cimport pair
from libcpp.utility cimport move
from pylibcudf.libcudf cimport transpose as cpp_transpose
from pylibcudf.libcudf.column.column cimport column
from pylibcudf.libcudf.table.table_view cimport table_view

from .column cimport Column
from .table cimport Table


cpdef tuple transpose(Table input_table):
"""Transpose a Table.

For details, see :cpp:func:`transpose`.

Parameters
----------
input_table : Table
Table to transpose

Returns
-------
tuple[Column, Table]
Two-tuple of the owner column and transposed table.
"""
cdef pair[unique_ptr[column], table_view] c_result

with nogil:
c_result = move(cpp_transpose.transpose(input_table.view()))

owner_column = Column.from_libcudf(move(c_result.first))
owner_table = Table([owner_column] * c_result.second.num_columns())

return (
owner_column,
Table.from_table_view(c_result.second, owner_table)
)
Matt711 marked this conversation as resolved.
Show resolved Hide resolved
Loading