Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with if scalar #315

Closed
MarcoGorelli opened this issue Nov 6, 2023 · 2 comments
Closed

Dealing with if scalar #315

MarcoGorelli opened this issue Nov 6, 2023 · 2 comments

Comments

@MarcoGorelli
Copy link
Contributor

The whole discussion around to_array is quite tricky, see #294 and #307 . One big difficulty is that for some libraries it can stay lazy (e.g. Dask has a lazy array), whereas for others it can't (polars LazyFrame doesn't have a to_numpy attribute)

Maybe we can temporarily park it, and try to address the more important (arguably) issue of what to do about

df: DataFrame
features = []
for column_name in df.column_names:
    if df.col(column_name).std() > 0:
        features.append(column_name)
return features

Because as far as I can tell, this call is problematic for all libraries other than purely eager ones. Even Dask, which was mentioned in #294 as an example of a library which can stay lazy in to_array, raises in the call above (see here).

Dask raises here, it doesn't do any implicit computation.

So...what do we do here? Maybe let's try resolving this one, and then return to to_array?

I'll hold off making suggestions this time, let's let the discussion roll

@cbourjau
Copy link
Contributor

cbourjau commented Nov 6, 2023

This seems related to #305 and what ought to happen if __bool__ is called on a Scalar. Any solution should probably incorporate the lessons learned from the analogous discussion for the array-api. In #305 I try to argue for piggy-backing on the array-API which would move this problem out of the scope of the dataframe API 🪄 .

@MarcoGorelli
Copy link
Contributor Author

I think we've resolved this now - __bool__ forces computation or raises (implementation-dependent), and it may be necessary to call persist first (also implementation-dependent)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants