Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyze and improve Silver Job Runs performance (Spark 3.3.0) #1230

Draft
wants to merge 12 commits into
base: 0900_release
Choose a base branch
from

Commits on May 22, 2024

  1. Initial commit

    gueniai committed May 22, 2024
    Configuration menu
    Copy the full SHA
    da0c55a View commit details
    Browse the repository at this point in the history

Commits on May 28, 2024

  1. Squashed commit of the following:

    commit a6a13fe
    Author: Neil Best <neil.best@databricks.com>
    Date:   Thu May 23 16:39:58 2024 -0500
    
        improve TransformationDescriberTest
    
    commit 1f145aa
    Author: Neil Best <neil.best@databricks.com>
    Date:   Thu May 23 15:25:29 2024 -0500
    
        Add descriptive job group IDs and named transformations
    
        This makes the Spark UI more developer-friendly when analyzing
        Overwatch runs.
    
        Job group IDs have the form <workspace name>:<OW module name>
    
        Any use of `.transform( df => df)` may be replaced with
        `.transformWithDescription( nt)` after instantiating a `val nt =
        NamedTransformation( df => df)` as its argument.
    
        This commit contains one such application of the new extension method.
        (See `val jobRunsAppendClusterName` in `WorkflowsTransforms.scala`.)
    
        Some logic in `GoldTransforms` falls through to elements of the
        special job-run-action form of Job Group IDs emitted by the platform
        but the impact is minimal relative to the benefit to Overwatch
        development and troubleshooting.  Even so this form of Job Group ID is
        still present in initial Spark events before OW ETL modules begin to
        execute.
    neilbest-db committed May 28, 2024
    Configuration menu
    Copy the full SHA
    be53dab View commit details
    Browse the repository at this point in the history
  2. Squashed commit of the following:

    commit 7158765
    Author: Neil Best <neil.best@databricks.com>
    Date:   Thu May 23 17:20:20 2024 -0500
    
        Add extension method to show `DataFrame` records in the log
    neilbest-db committed May 28, 2024
    Configuration menu
    Copy the full SHA
    de091da View commit details
    Browse the repository at this point in the history

Commits on May 31, 2024

  1. Initial commit

    gueniai committed May 31, 2024
    Configuration menu
    Copy the full SHA
    e134bd4 View commit details
    Browse the repository at this point in the history

Commits on Jun 4, 2024

  1. Configuration menu
    Copy the full SHA
    da14f88 View commit details
    Browse the repository at this point in the history

Commits on Jun 12, 2024

  1. compile for DBR 11.3 LTS

    - bump `scalaVersion`, `sparkVersion` & `delta-core`
    - adjust test suite, esp. where DDL no longer includes backticks
    - set log level to `WARN` for all tests using `BeforeAndAfterEach`
    neilbest-db committed Jun 12, 2024
    Configuration menu
    Copy the full SHA
    cc59ba9 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    9d728e9 View commit details
    Browse the repository at this point in the history
  3. Delete build.log

    neilbest-db committed Jun 12, 2024
    Configuration menu
    Copy the full SHA
    2b2fe08 View commit details
    Browse the repository at this point in the history
  4. Merge remote-tracking branch 'origin/0812_release' into 1228-analyze-…

    …and-improve-silver-job-runs-performance
    neilbest-db committed Jun 12, 2024
    Configuration menu
    Copy the full SHA
    5fb52bc View commit details
    Browse the repository at this point in the history

Commits on Jun 13, 2024

  1. Refactor lookups in Silver Job Runs

    Removed a level of indirection and unnecessary conditional branching in definition of chained `lookupWhen` transformations.
    
    Moved defintions to have references to `PipelineTable` objects in scope rather than passing them by argument.
    neilbest-db committed Jun 13, 2024
    Configuration menu
    Copy the full SHA
    efdd63f View commit details
    Browse the repository at this point in the history
  2. Squashed commit of the following:

    commit aeef7ff
    Author: sourav.banerjee <sourav.banerjee@databricks.com>
    Date:   Tue Jun 11 17:59:51 2024 +0530
    
        Dropped Spec column from snapshot
    
    commit a5c8b54
    Author: sourav.banerjee <sourav.banerjee@databricks.com>
    Date:   Tue Jun 11 17:52:52 2024 +0530
    
        Comvert all the struct field inside 'spec' column for cluster_snapshot_bronze to mapType
    neilbest-db committed Jun 13, 2024
    Configuration menu
    Copy the full SHA
    8378ce4 View commit details
    Browse the repository at this point in the history

Commits on Aug 23, 2024

  1. Configuration menu
    Copy the full SHA
    43f5c4c View commit details
    Browse the repository at this point in the history