Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] rlike cannot run on GPU because invalid or unsupported escape character ']' near index 14 #5275

Closed
viadea opened this issue Apr 19, 2022 · 0 comments · Fixed by #5284
Closed
Assignees
Labels
bug Something isn't working

Comments

@viadea
Copy link
Collaborator

viadea commented Apr 19, 2022

Env:
22.04 snapshot
22.06 snapshot

This is a minimum reproduce:

from pyspark.sql.types import StructType,StructField, StringType, IntegerType
data2 = [("abc[123]",""),
    ("xyz[123","Rose")
  ]

schema = StructType([ \
    StructField("firstname",StringType(),True), \
    StructField("middlename",StringType(),True) \
  ])
 
df = spark.createDataFrame(data=data2,schema=schema)
df.printSchema()
df.show(truncate=False)


simplified_regex = r"^abc\\[([0-9]+)\\]"

df.withColumn('newcol', expr(f"""   firstname rlike "{simplified_regex}"    """)).collect()

The result is:

[Row(firstname='abc[123]', middlename='', newcol=True), Row(firstname='xyz[123', middlename='Rose', newcol=False)]

And rlike will fallback due to "]"

@viadea viadea added bug Something isn't working ? - Needs Triage Need team to review and classify labels Apr 19, 2022
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Apr 19, 2022
@sameerz sameerz added this to the Apr 18 - Apr 29 milestone Apr 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants