Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Profiling tool does not show "Potential Problems" for dataset API in section "SQL Duration and Executor CPU Time Percent" #2930

Closed
viadea opened this issue Jul 14, 2021 · 1 comment
Assignees
Labels
bug Something isn't working tools

Comments

@viadea
Copy link
Collaborator

viadea commented Jul 14, 2021

Describe the bug
A clear and concise description of what the bug is.
Profiling tool does not show "Potential Problems" for dataset API in section "SQL Duration and Executor CPU Time Percent".

But the "Lambda" keyword is found in the "SQL Plan HealthCheck" section.

Such as:

SQL Duration and Executor CPU Time Percent
+--------+-----------------------+-----+------------+-------------------+------------+------------------+-------------------------+
|appIndex|App ID                 |sqlID|SQL Duration|Contains Dataset Op|App Duration|Potential Problems|Executor CPU Time Percent|
+--------+-----------------------+-----+------------+-------------------+------------+------------------+-------------------------+
|1       |app-11111111111111-0000|0    |2980        |false              |37541       |null              |52.04                    |
|1       |app-11111111111111-0000|1    |543         |true               |37541       |null              |48.71                    |
+--------+-----------------------+-----+------------+-------------------+------------+------------------+-------------------------+
SQL Plan HealthCheck:
Unsupported SQL Plan
+--------+-----+------+--------+---------------------------------------------------------------------------------------------------+
|appIndex|sqlID|nodeID|nodeName|nodeDescription                                                                                    |
+--------+-----+------+--------+---------------------------------------------------------------------------------------------------+
|1       |1    |8     |Filter  |Filter $line21.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$Lambda$4578/0x00000008019f1840@4b63e04c.apply|
+--------+-----+------+--------+---------------------------------------------------------------------------------------------------+

Steps/Code to reproduce bug
Please provide a list of steps or a code sample to reproduce the issue.
Avoid posting private or sensitive data.

Profile an event log which contains dataset api (Lambda keyword) in the query plan.

Expected behavior
A clear and concise description of what you expected to happen.

"Potential Problems" column in section "SQL Duration and Executor CPU Time Percent" should show the problematic dataset api.

Environment details (please complete the following information)

  • Environment location: [Standalone, YARN, Kubernetes, Cloud(specify cloud provider)]
  • Spark configuration settings related to the issue

Using latest 21.08 snapshot jar for this tool.

Additional context
Add any other context about the problem here.

@viadea viadea added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jul 14, 2021
@tgravescs tgravescs self-assigned this Jul 14, 2021
@tgravescs tgravescs added tools and removed ? - Needs Triage Need team to review and classify labels Jul 14, 2021
@tgravescs
Copy link
Collaborator

the dataset was never listed as a potential problem, it purely took away from the sql dataframe duration time and then listed in Contains Dataset Op.

@viadea viadea closed this as completed Jul 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working tools
Projects
None yet
Development

No branches or pull requests

2 participants