[FEA] Add support for percent_rank #4004

viadea · 2021-11-02T18:05:19Z

Is your feature request related to a problem? Please describe.
This is a feature quest to add support for percent_rank in windowing.

Here is a mini example:

val querytext="""SELECT gender,percent_rank(salary) OVER (PARTITION BY gender ORDER BY salary) from df2"""
sql(querytext).collect

The text was updated successfully, but these errors were encountered:

viadea · 2021-11-04T01:32:50Z

Full script to reproduce:

import org.apache.spark.sql.Row
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._

val data = Seq(
    Row(Row("Adam ","","Green"),"1","M",1000.1, "2019-01-01",List("Java","Scala")),
    Row(Row("Bob ","Middle","Green"),"2","M",2000.2, "2019-01-02",List("Java","Python")),
    Row(Row("Cathy ","","Green"),"3","F",3000.3, "2019-01-03",List())
)

val schema = (new StructType()
  .add("name",new StructType()
    .add("firstname",StringType)
    .add("middlename",StringType)
    .add("lastname",StringType)) 
  .add("id",StringType)
  .add("gender",StringType)
  .add("salary",DoubleType)
  .add("birthdayStr",StringType)
  .add("language",ArrayType(StringType))
             )

val df = spark.createDataFrame(spark.sparkContext.parallelize(data),schema)
df.withColumn("birthday", to_date(col("birthdayStr"))).write.format("parquet").mode("overwrite").save("/tmp/testparquet")
val df2 = spark.read.parquet("/tmp/testparquet")
df2.createOrReplaceTempView("df2")
df2.printSchema
val querytext="""SELECT gender,percent_rank(salary) OVER (PARTITION BY gender ORDER BY salary) from df2"""
sql(querytext).collect

Spark Driver log snippet:

        !NOT_FOUND <PercentRank> percent_rank(salary#90) cannot run on GPU because no GPU enabled version of expression class org.apache.spark.sql.catalyst.expressions.PercentRank could be found

viadea added feature request New feature or request ? - Needs Triage Need team to review and classify labels Nov 2, 2021

Salonijain27 added P1 Nice to have for release cudf_dependency An issue or PR with this label depends on a new feature in cudf and removed ? - Needs Triage Need team to review and classify labels Nov 2, 2021

res-life self-assigned this Nov 3, 2021

res-life mentioned this issue Nov 10, 2021

[FEA] percent_rank in window operations rapidsai/cudf#9644

Closed

NVnavkumar self-assigned this Feb 25, 2022

sameerz added this to the Feb 28 - Mar 18 milestone Mar 2, 2022

NVnavkumar mentioned this issue Mar 9, 2022

Implement percent_rank() on GPU #4924

Merged

res-life removed their assignment Mar 10, 2022

NVnavkumar closed this as completed in #4924 Mar 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Add support for percent_rank #4004

[FEA] Add support for percent_rank #4004

viadea commented Nov 2, 2021

viadea commented Nov 4, 2021

[FEA] Add support for percent_rank #4004

[FEA] Add support for percent_rank #4004

Comments

viadea commented Nov 2, 2021

viadea commented Nov 4, 2021