Ov 404 simplify upgrade process v1 #681

sudharshanraja-db · 2023-01-07T15:43:38Z

No description provided.

sonarcloud · 2023-01-09T12:31:09Z

SonarCloud Quality Gate failed.

0 Bugs
0 Vulnerabilities
0 Security Hotspots
6 Code Smells

No Coverage information
5.7% Duplication

GeekSheikh · 2023-01-09T16:31:45Z

src/main/scala/com/databricks/labs/overwatch/utils/UpgradeV2.scala

+
+
+  override def upgrade(): DataFrame = {
+    val tempDir = s"/tmp/overwatch/upgrade0610_status__ctrl_0x111/${System.currentTimeMillis()}"


this prefix should taken from prodWorkspace.getConfig.tempWorkingDir by default but be able to be optionally overridden in the constructor. Please default it to null string so user doesn't have to pass it in.

GeekSheikh · 2023-01-09T16:37:04Z

src/main/scala/com/databricks/labs/overwatch/utils/UpgradeV2.scala

+    val etlDatabaseName = prodWorkspace.database.getDatabaseName
+    dbutils.fs.mkdirs(tempDir) // init tempDir -- if no errors it wouldn't be created
+    val blankConfig = new Config()
+    val currentSchemaVersion = SchemaTools.getSchemaVersion(etlDatabaseName)
+    val numericalSchemaVersion = getNumericalSchemaVersion(currentSchemaVersion)
+    val targetSchemaVersion = "0.610"
+    validateSchemaUpgradeEligibility(currentSchemaVersion, targetSchemaVersion)
+    validateNumericalSchemaVersion(numericalSchemaVersion, 600, 610)
+    val upgradeStatus: ArrayBuffer[UpgradeReport] = ArrayBuffer()
+    val dbrVersion = spark.conf.get("spark.databricks.clusterUsageTags.effectiveSparkVersion")
+    val dbrMajorV = dbrVersion.split("\\.").head
+    val dbrMinorV = dbrVersion.split("\\.")(1)
+    val dbrVersionNumerical = s"$dbrMajorV.$dbrMinorV".toDouble
+    val initialSourceVersions: concurrent.Map[String, Long] = new ConcurrentHashMap[String, Long]().asScala
+    val packageVersion = blankConfig.getClass.getPackage.getImplementationVersion.replaceAll("\\.", "").tail.toInt
+    val startingSchemaVersion = SchemaTools.getSchemaVersion(etlDatabaseName).split("\\.").takeRight(1).head.toInt
+    validateSchemaAndPackageVersion(startingSchemaVersion, packageVersion, 600, 610)


Most of these items are common across all upgrades such as source and target version etc. The only exception is the dbr version stuff for the 0610 upgrade. All of the vals that are derived, can they be moved to the abstract class so they don't have to be defined every time?

GeekSheikh · 2023-01-09T16:42:09Z

src/main/scala/com/databricks/labs/overwatch/utils/UpgradeV2.scala

+    (startStep to endStep).foreach {
+      case 1 =>
+        val stepMsg = Some("Step 1: Upgrade Schema - Job Status Silver")


I want to try and clean this up so that all upgrades ultimately submit a map of step number and upgradeStepFunction to the UpgradeHandler. Hopefully, it can look something like

private def 0610Step1(...) = ??? private def 0610Step2(...) = ??? val upgradeMap(stepNumber -> upgradeFunc, ...) upgrade(upgradeMap)

something like this. The above is just pseudo code obviously but hopefully that is clear enough. The idea being that the dev creates n functions and then passes those to the upgrade -- I think this will make the code much cleaner and easier to navigate, thoughts?

Inside the upgrade handler all the exception handlers should be obfuscated there meaning all the try / catch / finally stuff should be in the abstract class or executor not in the actual upgrade script.

GeekSheikh · 2023-01-09T16:44:10Z

src/main/scala/com/databricks/labs/overwatch/utils/UpgradeV2.scala

+          checkIfTargetExists(etlDatabaseName, targetName)
+          initialSourceVersions.put(targetName, Helpers.getLatestTableVersionByName(s"${etlDatabaseName}.${targetName}"))
+          val jobSilverDF = spark.table(s"${etlDatabaseName}.${targetName}")
+          removeNestedColumnsAndSaveAsTable(jobSilverDF,"new_settings", Array("tasks", "job_clusters"),etlDatabaseName,targetName)


This should be the entirety of the step 1 upgrade func for 0610 upgrade. This hopefully will be all the code necessary to complete the 0610 step 1 upgrade as a dev building the upgrade.

GeekSheikh · 2023-01-09T16:46:28Z

src/main/scala/com/databricks/labs/overwatch/utils/UpgradeExecutor.scala

+
+class UpgradeExecutor(val etldbname: String, val fromVersion: Int, val toVersion: Int, val executeSingleStep: Int = -1) {
+
+  val versionMap = Map("600-609" -> classOf[UpgradeTo0610], "610-699" -> classOf[UpgradeTo0700])


Would you have a versionMap for each upgrade path? Is this just an example for 1 of them?

* initial commit * Changing Serverless to High-Concurrency (#706) * Changing Serverless to High-Concurrency * minor changes --------- Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com> Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com> * Spark events bronze par session (#678) * initial commit * Test Spark per session * notes from meeting * persession implemented * persession implemented * persession implemented * persession implemented * persession implemented * persession implemented * pr review implemented * 686 - SparkEvents Executor ID Schema Handler * global Session - initialize configs * Concurrent Writes - Table Lock for Parallelized Loads (#691) * initi commit -- working * code modified preWriteActions added * re-instanted perform retry for legacy deployments and added comments * added logging details * clear sessionsMap on batch runner --------- Co-authored-by: sriram251-code <sriram.mohanty@databricks.com> * minor fixes --------- Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com> Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> * adding new snapshots to bronze layer (#684) * adding new snapshots to bronze layer * changed all the single asset name to plural * adding transform function for new bronze snaps * changes applied to improve schema quality --------- Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com> * apiURL "/" removed and dbsql added (#699) * bug fix * bug fix * bug fix * bug fix * Improve acquisition of Cloud Provider and OrgID (#708) * Improve acquisition of Cloud Provider and OrgID * Improve acquisition of Cloud Provider and OrgID * Modularize getOrgID function * removed old commented version of code --------- Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com> Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com> * Update SilverTransforms.scala (#703) * Overwatch on photon broadcast exchange perf issue (#705) * Change the Join to Shuffle_hash Join for collectEventLogPaths * Change the Join to Shuffle_hash Join for collectEventLogPaths * adding pagination logic for job-runs api (#723) * 729 - enable clusterEvents merge Insert (#730) * enable clusterEvents merge Insert * added comments --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> * auditRaw - mergeInsertOnly (#738) * enable clusterEvents merge Insert * added comments * 737 - dateGlobFix and auditLogRaw mergeInserts --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> * implemented (#752) * asStrings implemented for apicallv2 (#707) * 749 fill meta improved (#753) * 749 fill meta improved * put tsPartVal in clsf back to 16 --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> * cleaning jobs_snap_bronze new_cluster field (#732) Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> * dbu and cost calculations fixed (#760) * dbu calcs corrected * readd aliases * add runtime_engine to fillforward * added a few comments to funcs * corrected workerDBU Cost Value * enabled remote getWorkspaceByDatabase (#754) Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> * improved first run Impute clusterSpec (#759) * excluded scope enhanced (#740) * excluded scope enhanced * review comment implemented * modified lowerCase logic --------- Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com> * adding temp location, start and end time in jobs runs list api (#755) * adding temp location, start and end time in jobs runs list api * change in jobRunsList api call * removed default apiVersion from new apply method * adding fix for jobs runs list api * adding code to cleanse duplicate cols in JobRunsList transform, and added new bronze snapshots in target * reading mount source from csv implemented (#695) * reading mount source from csv implemented * driver workspace should not call search/mount to get source * review comment implemented * review comment implemented * Reading config from delta implemented. (#713) * reading config from delta implemented. skip mount point check for AWS added. * review comment implemented * review comment implemented * review comment implemented * review comment implemented * review comment implemented * merge conflict removed * shuffle partition changed to String (#717) * shuffle partition changed to String * comments implemented * comments implemented * comments implemented * comments implemented * comments implemented * comments implemented * comments implemented * test cases added * test cases added * adding generic api calls function (#756) * adding generic api calls function * adding an empty map as return in APIMeta Trait for def getAPIJsonQuery * adding function getParallelAPIParams * implemented code review comments * removed commented lines * one workspace instance per workspace deployment (#774) * adding cluster type in jrcp view (#778) * improved spark conf handler and optimized confs (#773) * mount mapping validation added (#777) * mount mapping validation added * review comments implemented * review comments implemented * review comments implemented * review comments implemented * Integration Testing - Bug Fixes (#782) * added persistAndLoad to all writes with tableLocking * dont perform data validation if path validation fails -- protrects first run failures especially * fix excludedScopes * null config handlers plus proxy scope,key error handler * added persistAndLoad to all writes with tableLocking * dont perform data validation if path validation fails -- protrects first run failures especially * fix excludedScopes * null config handlers plus proxy scope,key error handler * cleanup * debugging * fixed futures executions * additional fixes * dbu cost fix * getOrgID bug fix * target exists enhancement for delta target path validation * getWorkspaceByDatabase -- cross-cloud remote workspace enabled * added experimental flag to jrsnapshot and enabled manual module disabling * rollback and module mapping --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> Co-authored-by: Sourav Banerjee <109206082+souravbaner-da@users.noreply.github.com> Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com> Co-authored-by: Sriram Mohanty <69749553+sriram251-code@users.noreply.github.com> Co-authored-by: Aman <91308367+aman-db@users.noreply.github.com>

* initial commit * 788 bug aws single tenant multiworkspace deployment noneget (#798) * Change Validate for MWS logic * removed debug print --------- Co-authored-by: sourav.banerjee <sourav.banerjee@databricks.com> * disable autoOptimizeShuffle (#795) * disable autoOptimizeShuffle * re-enabled optimizeShuffleParts * fixed exists tests (#799) * Enable API Error Reporting to PipReport and Safe Failure (#796) * possible solution * complete implementation * improved runtime calcs * improved fail message stdout --------- Co-authored-by: sourav.banerjee <sourav.banerjee@databricks.com>

* initial commit * Refractor InitializerFunctions.scala * Refractor InitializerFunctions.scala * Change Scala Sources Name * Refractor InitializerFunctions.scala * Refractor InitializerFunctions.scala * Added Initializerv2.scala * Added Initializerv2.scala * Changed as per Sriram comment * Changed as per Sriram comment * dropped Initializer Deprecated --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com> Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* initial commit * Refractor Initializer (#683) * initial commit * Refractor InitializerFunctions.scala * Refractor InitializerFunctions.scala * Change Scala Sources Name * Refractor InitializerFunctions.scala * Refractor InitializerFunctions.scala * Added Initializerv2.scala * Added Initializerv2.scala * Changed as per Sriram comment * Changed as per Sriram comment * dropped Initializer Deprecated --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com> Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com> * Change Job Trigger type to Triggered * Change Job Trigger type to Triggered * Change Job Trigger type to Triggered --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com> Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* initial commit * Refractor Initializer (#683) * initial commit * Refractor InitializerFunctions.scala * Refractor InitializerFunctions.scala * Change Scala Sources Name * Refractor InitializerFunctions.scala * Refractor InitializerFunctions.scala * Added Initializerv2.scala * Added Initializerv2.scala * Changed as per Sriram comment * Changed as per Sriram comment * dropped Initializer Deprecated --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com> Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com> * gcp integration added * gcp integration added * minor updates from daniel * review comment implemented --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> Co-authored-by: Sourav Banerjee <109206082+souravbaner-da@users.noreply.github.com> Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com> Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* conde changes completed * column name changed from etl_storage_prefix to storage prefix --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

…er. (#679) Co-authored-by: Carson Wilkins <carson.wilkins@databricks.com>

* code changes completed * code changes completed * code changes completed * code changes completed

* initial 0713 commit * handled null AccumUpates

Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>

* api namespace change * review comments implemented

* getParams implemented * review comments implemented

* code implemeted * documentation added * implemented * review comments implemented --------- Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* Added Logic to create PipReport * suggested approach for overriding signatures * added a few enhancements * minor updates * removed table import and updated table referene to spark.table --------- Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com> Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* Update azure/aws_instance_details.csv * Update Azure_instance_details.csv Updated F4 to F4s for the instance Standard_F4s. Similarly for F8s and F16s.

sonarcloud · 2023-04-11T15:29:57Z

SonarCloud Quality Gate failed.

0 Bugs
0 Vulnerabilities
0 Security Hotspots
60 Code Smells

No Coverage information
5.2% Duplication

sudharshanraja-db · 2023-04-11T15:30:40Z

closes #404

CLAassistant · 2023-11-27T20:15:52Z

All committers have signed the CLA.

GeekSheikh and others added 5 commits January 3, 2023 13:02

initial commit

ed77c65

First version of refactoring

aa5d15e

Move nested column handling on new_setting logic to abstract class

b902a37

fixes

e5f5840

fix issue in displaying upgrade path

20b3d5a

GeekSheikh suggested changes Jan 9, 2023

View reviewed changes

sudharshanraja-db and others added 23 commits April 11, 2023 20:51

Add upgrade schema version step

8da7fb5

initial commit

536c29d

auditlogprefix_source_aws changed to auditlogprefix_source_path (#807)

998efda

* conde changes completed * column name changed from etl_storage_prefix to storage prefix --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

upgraded sbt to current version to work with sbt and bloop build serv…

3862a5b

…er. (#679) Co-authored-by: Carson Wilkins <carson.wilkins@databricks.com>

conde changes completed (#808)

da7e9cd

write metrics issue resolved (#813)

1e48d17

* code changes completed * code changes completed * code changes completed * code changes completed

changing code to derive cloud type (#826)

4de1e9a

audit log changes (#836)

b2cf5db

837 accum updates handle nulls (#838)

5c051cb

* initial 0713 commit * handled null AccumUpates

added contextual sparkSession to DeltaTable instances (#840)

0d61542

temp_dir for MSW deployment added (#841)

819714c

handled class not found (#848)

02da077

Remove ClusterSpecDetailSilver Dependency from Cluster Gold (#844)

e1c4375

Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>

api namespace change (#842)

1c06f7e

* api namespace change * review comments implemented

added safety valve (#854)

2fa9c19

getParams implemented (#859)

f208f4a

* getParams implemented * review comments implemented

Slash Management (#825)

6e726ea

* code implemeted * documentation added * implemented * review comments implemented --------- Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

mohanbaabu1996 and others added 2 commits April 11, 2023 20:59

Update azure/aws_instance_details.csv (#876)

8a64b60

* Update azure/aws_instance_details.csv * Update Azure_instance_details.csv Updated F4 to F4s for the instance Standard_F4s. Similarly for F8s and F16s.

added pipeline_report to optimziation table list (#877)

34f46f9

sudharshanraja-db requested a review from GeekSheikh April 11, 2023 15:31

GeekSheikh changed the base branch from main to 0730_release April 28, 2023 14:45

GeekSheikh force-pushed the 0730_release branch from c55109a to 9ca9359 Compare June 23, 2023 12:11

gueniai requested review from gueniai and souravbaner-da and removed request for GeekSheikh August 10, 2023 14:23

gueniai added this to the 0.7.3.0 milestone Aug 10, 2023

gueniai added the enhancement New feature or request label Aug 10, 2023

gueniai force-pushed the 0730_release branch from 9ca9359 to a703f85 Compare October 17, 2023 20:55

gueniai modified the milestones: 0.7.3.0, backlog Nov 29, 2023

gueniai force-pushed the 0730_release branch from a0fe650 to a8317cf Compare December 7, 2023 11:51

neilbest-db self-assigned this Apr 23, 2024

neilbest-db marked this pull request as draft May 15, 2024 21:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ov 404 simplify upgrade process v1 #681

Ov 404 simplify upgrade process v1 #681

sudharshanraja-db commented Jan 7, 2023

sonarcloud bot commented Jan 9, 2023

GeekSheikh Jan 9, 2023

GeekSheikh Jan 9, 2023

GeekSheikh Jan 9, 2023

GeekSheikh Jan 9, 2023

GeekSheikh Jan 9, 2023

GeekSheikh Jan 9, 2023

sonarcloud bot commented Apr 11, 2023

sudharshanraja-db commented Apr 11, 2023

CLAassistant commented Nov 27, 2023 •

edited

Loading



		override def upgrade(): DataFrame = {
		val tempDir = s"/tmp/overwatch/upgrade0610_status__ctrl_0x111/${System.currentTimeMillis()}"


		class UpgradeExecutor(val etldbname: String, val fromVersion: Int, val toVersion: Int, val executeSingleStep: Int = -1) {

		val versionMap = Map("600-609" -> classOf[UpgradeTo0610], "610-699" -> classOf[UpgradeTo0700])

Ov 404 simplify upgrade process v1 #681

Are you sure you want to change the base?

Ov 404 simplify upgrade process v1 #681

Conversation

sudharshanraja-db commented Jan 7, 2023

sonarcloud bot commented Jan 9, 2023

GeekSheikh Jan 9, 2023

Choose a reason for hiding this comment

GeekSheikh Jan 9, 2023

Choose a reason for hiding this comment

GeekSheikh Jan 9, 2023

Choose a reason for hiding this comment

GeekSheikh Jan 9, 2023

Choose a reason for hiding this comment

GeekSheikh Jan 9, 2023

Choose a reason for hiding this comment

GeekSheikh Jan 9, 2023

Choose a reason for hiding this comment

sonarcloud bot commented Apr 11, 2023

sudharshanraja-db commented Apr 11, 2023

CLAassistant commented Nov 27, 2023 • edited Loading

CLAassistant commented Nov 27, 2023 •

edited

Loading