Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyze and improve Silver Job Runs performance (Spark 3.3.0) #1230

Draft
wants to merge 12 commits into
base: 0900_release
Choose a base branch
from

Conversation

neilbest-db
Copy link
Contributor

No description provided.

gueniai and others added 3 commits May 22, 2024 16:22
commit a6a13fe
Author: Neil Best <neil.best@databricks.com>
Date:   Thu May 23 16:39:58 2024 -0500

    improve TransformationDescriberTest

commit 1f145aa
Author: Neil Best <neil.best@databricks.com>
Date:   Thu May 23 15:25:29 2024 -0500

    Add descriptive job group IDs and named transformations

    This makes the Spark UI more developer-friendly when analyzing
    Overwatch runs.

    Job group IDs have the form <workspace name>:<OW module name>

    Any use of `.transform( df => df)` may be replaced with
    `.transformWithDescription( nt)` after instantiating a `val nt =
    NamedTransformation( df => df)` as its argument.

    This commit contains one such application of the new extension method.
    (See `val jobRunsAppendClusterName` in `WorkflowsTransforms.scala`.)

    Some logic in `GoldTransforms` falls through to elements of the
    special job-run-action form of Job Group IDs emitted by the platform
    but the impact is minimal relative to the benefit to Overwatch
    development and troubleshooting.  Even so this form of Job Group ID is
    still present in initial Spark events before OW ETL modules begin to
    execute.
commit 7158765
Author: Neil Best <neil.best@databricks.com>
Date:   Thu May 23 17:20:20 2024 -0500

    Add extension method to show `DataFrame` records in the log
@neilbest-db neilbest-db added the optimization Technical Spark Optimization label May 29, 2024
@neilbest-db neilbest-db added this to the 0.8.2.0 milestone May 29, 2024
@neilbest-db neilbest-db self-assigned this May 29, 2024
@neilbest-db neilbest-db linked an issue May 29, 2024 that may be closed by this pull request
gueniai and others added 4 commits May 31, 2024 15:03
- bump `scalaVersion`, `sparkVersion` & `delta-core`
- adjust test suite, esp. where DDL no longer includes backticks
- set log level to `WARN` for all tests using `BeforeAndAfterEach`
@neilbest-db neilbest-db force-pushed the 1228-analyze-and-improve-silver-job-runs-performance branch from 842b977 to 9d728e9 Compare June 12, 2024 19:29
Removed a level of indirection and unnecessary conditional branching in definition of chained `lookupWhen` transformations.

Moved defintions to have references to `PipelineTable` objects in scope rather than passing them by argument.
@neilbest-db neilbest-db changed the base branch from main to 0820_release June 13, 2024 14:26
commit aeef7ff
Author: sourav.banerjee <sourav.banerjee@databricks.com>
Date:   Tue Jun 11 17:59:51 2024 +0530

    Dropped Spec column from snapshot

commit a5c8b54
Author: sourav.banerjee <sourav.banerjee@databricks.com>
Date:   Tue Jun 11 17:52:52 2024 +0530

    Comvert all the struct field inside 'spec' column for cluster_snapshot_bronze to mapType
Copy link

sonarcloud bot commented Jun 13, 2024

Quality Gate Passed Quality Gate passed

Issues
12 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.5% Duplication on New Code

See analysis details on SonarCloud

@neilbest-db neilbest-db changed the title Analyze and improve silver job runs performance Analyze and improve Silver Job Runs performance Jun 17, 2024
@neilbest-db neilbest-db changed the title Analyze and improve Silver Job Runs performance Analyze and improve Silver Job Runs performance (Spark 3.3.0) Jun 26, 2024
@neilbest-db neilbest-db modified the milestones: 0.8.2.0, 0.9.0.0 Jun 29, 2024
Copy link

sonarcloud bot commented Aug 23, 2024

@neilbest-db neilbest-db changed the base branch from 0820_release to 0900_release August 26, 2024 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization Technical Spark Optimization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants