Skip to content

Commit

Permalink
1030 pipeline validation framework (#1071)
Browse files Browse the repository at this point in the history
* Initial commit

* 19-Oct-23 : Added Validation Framework

* 19-Oct-23: Customize the message for customer

* 19-Oct-23: Customize the message for customer

* 26-Oct-23: Added OverwatchID filter in the table

* 26-Oct-23: Change for Coding Best Practices

* Added Function Description for validateColumnBetweenMultipleTable

* Added Pattern Matching in Validation

* Convert if-else in validateRuleAndUpdateStatus to case statement as per comment

* Initial commit

* traceability implemented (#1102)

* traceability implemented

* code review implemented

* missed code implemented (#1105)

* Initial commit

* traceability implemented (#1102)

* traceability implemented

* code review implemented

* missed code implemented

* missed code implemented

---------

Co-authored-by: Guenia Izquierdo <guenia.izquierdo@databricks.com>

* Added proper exception for Spark Stream Gold if progress c… (#1085)

* Initial commit

* 09-Nov-23: Added proper exception for Spark Stream Gold if progress column contains only null in SparkEvents_Bronze

---------

Co-authored-by: Guenia Izquierdo <guenia.izquierdo@databricks.com>
Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>

* Gracefully Handle Exception for NotebookCommands_Gold (#1095)

* Initial commit

* Gracefully Handle Exception for NotebookCommands_Gold

* Convert the check in buildNotebookCommandsFact to single or clause

---------

Co-authored-by: Guenia Izquierdo <guenia.izquierdo@databricks.com>
Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>

* code missed in merge (#1120)

* Fix Helper Method to Instantiate Remote Workspaces (#1110)

* Initial commit

* Change getRemoteWorkspaceByPath and getWorkspaceByDatabase to take it RemoteWorkspace

* Remove Unnecessary println Statements

---------

Co-authored-by: Guenia Izquierdo <guenia.izquierdo@databricks.com>

* Ensure we test the write into a partitioned storage_prefix (#1088)

* Initial commit

* Ensure we test the write into a partitioned storage_prefix

* silver warehouse spec fix (#1121)

* added missed copy-pasta (#1129)

* Exclude cluster logs in S3 root bucket (#1118)

* Exclude cluster logs in S3 root bucket

* Omit cluster log paths pointing to s3a as well

* implemented recon (#1116)

* implemented recon

* docs added

* file path change

* review comments implemented

* Added ShuffleFactor to NotebookCommands (#1124)

Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>

* disabled traceability (#1130)

* Added JobRun_Silver in buildClusterStateFact for Cluster E… (#1083)

* Initial commit

* 08-Nov-23: Added JobRun_Silver in buildClusterStateFact for Cluster End Time Imputation

* Impute Terminating Events in CLSF from JR_Silver

* Impute Terminating Events in CLSD

* Impute Terminating Events in CLSD

* Change CLSF to original 0730 version

* Change CLSF to original 0730 version

* Added cluster_spec in CLSD to get job Cluster only

* Make the variables name in buildClusterStateDetail into more descriptive way

* Make the variables name in buildClusterStateDetail into more descriptive way

---------

Co-authored-by: Guenia Izquierdo <guenia.izquierdo@databricks.com>
Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>

* Sys table audit log integration (#1122)

* system table integration with audit log

* adding code to resolve issues with response col

* fixed timestamp issue

* adding print statement for from and until time

* adding fix for azure

* removed comments

* removed comments and print statements

* removed comments

* implemented code review comments

* implemented code review comments

* adding review comment

* Sys table integration multi acount (#1131)

* added code changes for multi account deployment

* code for multi account system table integration

* Sys table integration multi acount (#1132)

* added code changes for multi account deployment

* code for multi account system table integration

* adding code for system table migration check

* changing exception for empty audit log from system table

* adding code to handle sql_endpoint in configs and fix in migration validation (#1133)

* corner case commit (#1134)

* Handle CLSD Cluster Impute when jrcp and clusterSpec is Empty (#1135)

* Handle CLSD Cluster Impute when jrcp and clusterSpec is Empty

* Exclude last_state from clsd as it is not needed in the logic.

---------

Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>

* Exclude 2011 and 2014 as dependency module for 2019 (#1136)

* Exclude 2011 and 2014 as dependency module for 2019

* Added comment in CLSD for understandability

---------

Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>

* corner case commit (#1137)

* Update version

* adding fix for empty EH config for system tables (#1140)

* corner case commit (#1142)

* adding fix for empty audit log for warehouse_spec_silver (#1141)

* recon columns removed (#1143)

* recon columns removed

* recon columns removed

* Initial Commit

* Added Changes in Validation Framework as per comments added during sprint meeting

* added hotfix for warehouse_spec_silver (#1154)

* Added Multiple RunID check in Validation Frameowkr

* Added Other tables in Validation Framework

* Added Multiple WS ID option in Cros Table Validation

* Added change for Pipeline_report

* Change for Pipeline Report

* Added msg for single table validation

* Added negative msg in HealthCheck Report

* Added Negative Msg for Cross Table Validation

* Added extra filter for total cost validation for CLSF

* Changed as per Comments

* Changed as per the comments

* Added some filter condition for cost validation in clsf

* Added Config for all pipeline run

* 19-Oct-23 : Added Validation Framework

* 19-Oct-23: Customize the message for customer

* 19-Oct-23: Customize the message for customer

* 26-Oct-23: Added OverwatchID filter in the table

* 26-Oct-23: Change for Coding Best Practices

* Added Function Description for validateColumnBetweenMultipleTable

* Added Pattern Matching in Validation

* Convert if-else in validateRuleAndUpdateStatus to case statement as per comment

* traceability implemented (#1102)

* traceability implemented

* code review implemented

* Added JobRun_Silver in buildClusterStateFact for Cluster E… (#1083)

* Initial commit

* 08-Nov-23: Added JobRun_Silver in buildClusterStateFact for Cluster End Time Imputation

* Impute Terminating Events in CLSF from JR_Silver

* Impute Terminating Events in CLSD

* Impute Terminating Events in CLSD

* Change CLSF to original 0730 version

* Change CLSF to original 0730 version

* Added cluster_spec in CLSD to get job Cluster only

* Make the variables name in buildClusterStateDetail into more descriptive way

* Make the variables name in buildClusterStateDetail into more descriptive way

---------

Co-authored-by: Guenia Izquierdo <guenia.izquierdo@databricks.com>
Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>

* corner case commit (#1134)

* Exclude 2011 and 2014 as dependency module for 2019 (#1136)

* Exclude 2011 and 2014 as dependency module for 2019

* Added comment in CLSD for understandability

---------

Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>

* Added Changes in Validation Framework as per comments added during sprint meeting

* Added Multiple RunID check in Validation Frameowkr

* Added Other tables in Validation Framework

* Added Multiple WS ID option in Cros Table Validation

* Added change for Pipeline_report

* Change for Pipeline Report

* Added msg for single table validation

* Added negative msg in HealthCheck Report

* Added Negative Msg for Cross Table Validation

* Added extra filter for total cost validation for CLSF

* Changed as per Comments

* Changed as per the comments

* Added some filter condition for cost validation in clsf

* Added Config for all pipeline run

---------

Co-authored-by: Guenia Izquierdo <guenia.izquierdo@databricks.com>
Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>
Co-authored-by: Sriram Mohanty <69749553+sriram251-code@users.noreply.github.com>
Co-authored-by: Aman <91308367+aman-db@users.noreply.github.com>
  • Loading branch information
5 people committed Sep 5, 2024
1 parent dfac816 commit d085074
Show file tree
Hide file tree
Showing 6 changed files with 865 additions and 2 deletions.
26 changes: 26 additions & 0 deletions src/main/scala/com/databricks/labs/overwatch/api/ApiMeta.scala
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,32 @@ trait ApiMeta {
jsonObject.toString
}

/**
* Function will add the meta info to the api response.
*
* @param response
* @param jsonQuery
* @param queryMap
* @return a string containing the api response and the meta for the api call.
*/
private[overwatch] def enrichAPIResponse(response: HttpResponse[String], jsonQuery: String, queryMap: Map[String, String]): String = {
val filter: String = if (apiCallType.equals("POST")) jsonQuery else {
val mapper = new ObjectMapper()
mapper.registerModule(DefaultScalaModule)
mapper.writeValueAsString(queryMap)
}
val jsonObject = new JSONObject();
val apiTraceabilityMeta = new JSONObject();
apiTraceabilityMeta.put("endPoint", apiName)
apiTraceabilityMeta.put("type", apiCallType)
apiTraceabilityMeta.put("apiVersion", apiV)
apiTraceabilityMeta.put("responseCode", response.code)
apiTraceabilityMeta.put("batchKeyFilter", filter)
jsonObject.put("rawResponse", response.body.trim)
jsonObject.put("apiTraceabilityMeta", apiTraceabilityMeta)
jsonObject.toString
}

}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -652,7 +652,7 @@ trait BronzeTransforms extends SparkSessionWrapper {
}
if (Helpers.pathExists(tmpClusterEventsErrorPath)) {
persistErrors(
deriveRawApiResponseDF(spark.read.json(tmpClusterEventsErrorPath))
spark.read.json(tmpClusterEventsErrorPath)
.withColumn("from_ts", toTS(col("from_epoch")))
.withColumn("until_ts", toTS(col("until_epoch"))),
database,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ import org.apache.spark.sql.{AnalysisException, Column, DataFrame}
// Perhaps add the strategy into the Rule definition in the Rules Engine
case class PipelineTable(
name: String,
private val _keys: Array[String],
private[overwatch] val _keys: Array[String],
config: Config,
incrementalColumns: Array[String] = Array(),
format: String = "delta", // TODO -- Convert to Enum
Expand Down
Loading

0 comments on commit d085074

Please sign in to comment.