1030 pipeline validation framework (#1071) · databrickslabs/overwatch@d085074

Commit

1030 pipeline validation framework (#1071)

* Initial commit

* 19-Oct-23 : Added Validation Framework

* 19-Oct-23: Customize the message for customer

* 19-Oct-23: Customize the message for customer

* 26-Oct-23: Added OverwatchID filter in the table

* 26-Oct-23: Change for Coding Best Practices

* Added Function Description for validateColumnBetweenMultipleTable

* Added Pattern Matching in Validation

* Convert if-else in validateRuleAndUpdateStatus to case statement as per comment

* Initial commit

* traceability implemented (#1102)

* traceability implemented

* code review implemented

* missed code implemented (#1105)

* Initial commit

* traceability implemented (#1102)

* traceability implemented

* code review implemented

* missed code implemented

* missed code implemented

---------

Co-authored-by: Guenia Izquierdo <guenia.izquierdo@databricks.com>

* Added proper exception for Spark Stream Gold if progress c… (#1085)

* Initial commit

* 09-Nov-23: Added proper exception for Spark Stream Gold if progress column contains only null in SparkEvents_Bronze

---------

Co-authored-by: Guenia Izquierdo <guenia.izquierdo@databricks.com>
Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>

* Gracefully Handle Exception for NotebookCommands_Gold (#1095)

* Initial commit

* Gracefully Handle Exception for NotebookCommands_Gold

* Convert the check in buildNotebookCommandsFact to single or clause

---------

Co-authored-by: Guenia Izquierdo <guenia.izquierdo@databricks.com>
Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>

* code missed in merge (#1120)

* Fix Helper Method to Instantiate Remote Workspaces (#1110)

* Initial commit

* Change getRemoteWorkspaceByPath and getWorkspaceByDatabase to take it RemoteWorkspace

* Remove Unnecessary println Statements

---------

Co-authored-by: Guenia Izquierdo <guenia.izquierdo@databricks.com>

* Ensure we test the write into a partitioned storage_prefix (#1088)

* Initial commit

* Ensure we test the write into a partitioned storage_prefix

* silver warehouse spec fix (#1121)

* added missed copy-pasta (#1129)

* Exclude cluster logs in S3 root bucket (#1118)

* Exclude cluster logs in S3 root bucket

* Omit cluster log paths pointing to s3a as well

* implemented recon (#1116)

* implemented recon

* docs added

* file path change

* review comments implemented

* Added ShuffleFactor to NotebookCommands (#1124)

Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>

* disabled traceability (#1130)

* Added JobRun_Silver in buildClusterStateFact for Cluster E… (#1083)

* Initial commit

* 08-Nov-23: Added JobRun_Silver in buildClusterStateFact for Cluster End Time Imputation

* Impute Terminating Events in CLSF from JR_Silver

* Impute Terminating Events in CLSD

* Impute Terminating Events in CLSD

* Change CLSF to original 0730 version

* Change CLSF to original 0730 version

* Added cluster_spec in CLSD to get job Cluster only

* Make the variables name in buildClusterStateDetail into more descriptive way

* Make the variables name in buildClusterStateDetail into more descriptive way

---------

Co-authored-by: Guenia Izquierdo <guenia.izquierdo@databricks.com>
Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>

* Sys table audit log integration (#1122)

* system table integration with audit log

* adding code to resolve issues with response col

* fixed timestamp issue

* adding print statement for from and until time

* adding fix for azure

* removed comments

* removed comments and print statements

* removed comments

* implemented code review comments

* implemented code review comments

* adding review comment

* Sys table integration multi acount (#1131)

* added code changes for multi account deployment

* code for multi account system table integration

* Sys table integration multi acount (#1132)

* added code changes for multi account deployment

* code for multi account system table integration

* adding code for system table migration check

* changing exception for empty audit log from system table

* adding code to handle sql_endpoint in configs and fix in migration validation (#1133)

* corner case commit (#1134)

* Handle CLSD Cluster Impute when jrcp and clusterSpec is Empty (#1135)

* Handle CLSD Cluster Impute when jrcp and clusterSpec is Empty

* Exclude last_state from clsd as it is not needed in the logic.

---------

Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>

* Exclude 2011 and 2014 as dependency module for 2019 (#1136)

* Exclude 2011 and 2014 as dependency module for 2019

* Added comment in CLSD for understandability

---------

Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>

* corner case commit (#1137)

* Update version

* adding fix for empty EH config for system tables (#1140)

* corner case commit (#1142)

* adding fix for empty audit log for warehouse_spec_silver (#1141)

* recon columns removed (#1143)

* recon columns removed

* recon columns removed

* Initial Commit

* Added Changes in Validation Framework as per comments added during sprint meeting

* added hotfix for warehouse_spec_silver (#1154)

* Added Multiple RunID check in Validation Frameowkr

* Added Other tables in Validation Framework

* Added Multiple WS ID option in Cros Table Validation

* Added change for Pipeline_report

* Change for Pipeline Report

* Added msg for single table validation

* Added negative msg in HealthCheck Report

* Added Negative Msg for Cross Table Validation

* Added extra filter for total cost validation for CLSF

* Changed as per Comments

* Changed as per the comments

* Added some filter condition for cost validation in clsf

* Added Config for all pipeline run

* 19-Oct-23 : Added Validation Framework

* 19-Oct-23: Customize the message for customer

* 19-Oct-23: Customize the message for customer

* 26-Oct-23: Added OverwatchID filter in the table

* 26-Oct-23: Change for Coding Best Practices

* Added Function Description for validateColumnBetweenMultipleTable

* Added Pattern Matching in Validation

* Convert if-else in validateRuleAndUpdateStatus to case statement as per comment

* traceability implemented (#1102)

* traceability implemented

* code review implemented

* Added JobRun_Silver in buildClusterStateFact for Cluster E… (#1083)

* Initial commit

* 08-Nov-23: Added JobRun_Silver in buildClusterStateFact for Cluster End Time Imputation

* Impute Terminating Events in CLSF from JR_Silver

* Impute Terminating Events in CLSD

* Impute Terminating Events in CLSD

* Change CLSF to original 0730 version

* Change CLSF to original 0730 version

* Added cluster_spec in CLSD to get job Cluster only

* Make the variables name in buildClusterStateDetail into more descriptive way

* Make the variables name in buildClusterStateDetail into more descriptive way

---------

Co-authored-by: Guenia Izquierdo <guenia.izquierdo@databricks.com>
Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>

* corner case commit (#1134)

* Exclude 2011 and 2014 as dependency module for 2019 (#1136)

* Exclude 2011 and 2014 as dependency module for 2019

* Added comment in CLSD for understandability

---------

Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>

* Added Changes in Validation Framework as per comments added during sprint meeting

* Added Multiple RunID check in Validation Frameowkr

* Added Other tables in Validation Framework

* Added Multiple WS ID option in Cros Table Validation

* Added change for Pipeline_report

* Change for Pipeline Report

* Added msg for single table validation

* Added negative msg in HealthCheck Report

* Added Negative Msg for Cross Table Validation

* Added extra filter for total cost validation for CLSF

* Changed as per Comments

* Changed as per the comments

* Added some filter condition for cost validation in clsf

* Added Config for all pipeline run

---------

Co-authored-by: Guenia Izquierdo <guenia.izquierdo@databricks.com>
Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>
Co-authored-by: Sriram Mohanty <69749553+sriram251-code@users.noreply.github.com>
Co-authored-by: Aman <91308367+aman-db@users.noreply.github.com>

Loading branch information

5 people committed Sep 5, 2024

1 parent dfac816 commit d085074

src/main/scala/com/databricks/labs/overwatch/api/ApiMeta.scala

-Original file line number
+Diff line change
@@ Expand Up / @@ -200,6 +200,32 @@ trait ApiMeta { @@
         jsonObject.toString
       }
+      /**
+       * Function will add the meta info to the api response.
+       *
+       * @param response
+       * @param jsonQuery
+       * @param queryMap
+       * @return a string containing the api response and the meta for the api call.
+       */
+      private[overwatch] def enrichAPIResponse(response: HttpResponse[String], jsonQuery: String, queryMap: Map[String, String]): String = {
+        val filter: String = if (apiCallType.equals("POST")) jsonQuery else {
+          val mapper = new ObjectMapper()
+          mapper.registerModule(DefaultScalaModule)
+          mapper.writeValueAsString(queryMap)
+        }
+        val jsonObject = new JSONObject();
+        val apiTraceabilityMeta = new JSONObject();
+        apiTraceabilityMeta.put("endPoint", apiName)
+        apiTraceabilityMeta.put("type", apiCallType)
+        apiTraceabilityMeta.put("apiVersion", apiV)
+        apiTraceabilityMeta.put("responseCode", response.code)
+        apiTraceabilityMeta.put("batchKeyFilter", filter)
+        jsonObject.put("rawResponse", response.body.trim)
+        jsonObject.put("apiTraceabilityMeta", apiTraceabilityMeta)
+        jsonObject.toString
+      }
     }
     /**
@@ Expand Down @@

src/main/scala/com/databricks/labs/overwatch/pipeline/BronzeTransforms.scala

-Original file line number
+Diff line change
@@ Expand Up / @@ -652,7 +652,7 @@ trait BronzeTransforms extends SparkSessionWrapper { @@
         }
          if (Helpers.pathExists(tmpClusterEventsErrorPath)) {
           persistErrors(
-            deriveRawApiResponseDF(spark.read.json(tmpClusterEventsErrorPath))
+            spark.read.json(tmpClusterEventsErrorPath)
               .withColumn("from_ts", toTS(col("from_epoch")))
               .withColumn("until_ts", toTS(col("until_epoch"))),
             database,
@@ Expand Down @@

src/main/scala/com/databricks/labs/overwatch/pipeline/PipelineTable.scala

-Original file line number
+Diff line change
@@ Expand Up @@
     //  Perhaps add the strategy into the Rule definition in the Rules Engine
     case class PipelineTable(
                               name: String,
-                              private val _keys: Array[String],
+                              private[overwatch] val _keys: Array[String],
                               config: Config,
                               incrementalColumns: Array[String] = Array(),
                               format: String = "delta", // TODO -- Convert to Enum
@@ Expand Down @@

0 comments on commit `d085074`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `d085074`

Commit

There are no files selected for viewing

0 comments on commit d085074

0 comments on commit `d085074`