Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integration with the polars api #32

Closed
BielStela opened this issue Oct 6, 2023 · 6 comments
Closed

integration with the polars api #32

BielStela opened this issue Oct 6, 2023 · 6 comments

Comments

@BielStela
Copy link
Contributor

Good morning. First and foremost, congrats for this library, it is a joy to use! I've been playing around the Polars functions and wondered if those could be used somehow with the Expressions api in polars.

To do a group_by parent cell one can do (maybe not the best approach at all)

df.with_columns(
            pl.col("h3index").map_batches(lambda x: change_resolution(x, h3res))
        )
        .group_by("h3index")
        .agg(pl.col("value").sum())

But if somehow we could make change_resolution part of the expressions api this could be done like

 df.with_columns(
           pl.col("h3index").h3.change_resolution(h3res)
       )
       .group_by("h3index")
       .agg(pl.col("value").sum())

My first question should be if my assumption is correct and the polars functions in this lib must be treated as user defined functions in order to integrate them in polars or there's better way to use h3ronpy's functions in polars that I'm not seeing?

hmm maybe this is more of a polars related question than h3ronpy's but anyway, here it is !
Thanks!

@nmandery
Copy link
Owner

nmandery commented Oct 6, 2023

Great to hear you like the library 😄

You got a good point that a tighter integration with polars API would be desirable. I just had a look at this and it seems easy to implement, at least for functions returning a single series.

See #33 for a initial draft of this supporting only the first few h3ronpy functions. The unittests provide some examples. I will try to find some time to push this forward.

@BielStela
Copy link
Contributor Author

Nice! Looks like the a reasonable idea to use the polars namespace extension api. I can give it a go if you are bussy :).

Also, a bit of a tanget here, but maybe is it worth taking a look at https://github.com/pola-rs/pyo3-polars (section 1. in readme) which is starting to lay the ground for native extensions that can take advantage of the polars machinery ( lazy optimizations and parallelism) if I'm not mistaken. But I have no clue if the perf and ergoomics will be worth the effort. I guess I will give it a go too so it can be benchmarked

@nmandery
Copy link
Owner

nmandery commented Oct 7, 2023

If you want to contribute that would be great. I just pushed one more commit to the PR, but there are still things to do.

I noticed pyo3-polars and it appears quite interesting. I just would want to remain compatible to pandas without having a hard dependency on polars. At least as long as the geo-support in polars has not reached some maturity. A performance comparrison would be interessting, I just would not expect too significant improvements as python is just used to pass references to arrow-memory from rust code to rust code, but benchmarking always helps ;)

@nmandery
Copy link
Owner

I am tempted to close this issue, now that #33 has been merged. @BielStela Do you see anything still missing?

@BielStela
Copy link
Contributor Author

This is awesome @nmandery. I don't see anything missing!

@nmandery
Copy link
Owner

Great.

BTW: I just had a look at the Vizzuality site - the things you are building there a looking fantastic. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants