Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleaning jobs_snap_bronze new_cluster field #732

Merged
merged 1 commit into from
Feb 17, 2023

Conversation

GeekSheikh
Copy link
Contributor

closes #731

@sonarcloud
Copy link

sonarcloud bot commented Feb 12, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@GeekSheikh GeekSheikh marked this pull request as ready for review February 12, 2023 15:43
@GeekSheikh GeekSheikh linked an issue Feb 13, 2023 that may be closed by this pull request
@GeekSheikh GeekSheikh requested review from sriram251-code and removed request for Sriram-databricks February 16, 2023 20:55
Copy link
Contributor

@brij-raghuwanshi-db brij-raghuwanshi-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@GeekSheikh GeekSheikh merged commit 4c974a1 into 0711_release Feb 17, 2023
@GeekSheikh GeekSheikh deleted the 731_jsnap_structs branch February 17, 2023 17:19
GeekSheikh added a commit that referenced this pull request Feb 25, 2023
* initial commit

* Changing Serverless to High-Concurrency (#706)

* Changing Serverless to High-Concurrency

* minor changes

---------

Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>
Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* Spark events bronze par session (#678)

* initial commit

* Test Spark per session

* notes from meeting

* persession implemented

* persession implemented

* persession implemented

* persession implemented

* persession implemented

* persession implemented

* pr review implemented

* 686 - SparkEvents Executor ID Schema Handler

* global Session - initialize configs

* Concurrent Writes - Table Lock for Parallelized Loads (#691)

* initi commit -- working

* code modified preWriteActions added

* re-instanted perform retry for legacy deployments and added comments

* added logging details

* clear sessionsMap on batch runner

---------

Co-authored-by: sriram251-code <sriram.mohanty@databricks.com>

* minor fixes

---------

Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>
Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* adding new snapshots to bronze layer (#684)

* adding new snapshots to bronze layer

* changed all the single asset name to plural

* adding transform function for new bronze snaps

* changes applied to improve schema quality

---------

Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* apiURL "/" removed and dbsql added (#699)

* bug fix

* bug fix

* bug fix

* bug fix

* Improve acquisition of Cloud Provider and OrgID (#708)

* Improve acquisition of Cloud Provider and OrgID

* Improve acquisition of Cloud Provider and OrgID

* Modularize getOrgID function

* removed old commented version of code

---------

Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>
Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* Update SilverTransforms.scala (#703)

* Overwatch on photon broadcast exchange perf issue (#705)

* Change the Join to Shuffle_hash Join for collectEventLogPaths

* Change the Join to Shuffle_hash Join for collectEventLogPaths

* adding pagination logic for job-runs api (#723)

* 729 - enable clusterEvents merge Insert (#730)

* enable clusterEvents merge Insert

* added comments

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* auditRaw - mergeInsertOnly (#738)

* enable clusterEvents merge Insert

* added comments

* 737 - dateGlobFix and auditLogRaw mergeInserts

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* implemented (#752)

* asStrings implemented for apicallv2 (#707)

* 749 fill meta improved (#753)

* 749 fill meta improved

* put tsPartVal in clsf back to 16

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* cleaning jobs_snap_bronze new_cluster field (#732)

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* dbu and cost calculations fixed (#760)

* dbu calcs corrected

* readd aliases

* add runtime_engine to fillforward

* added a few comments to funcs

* corrected workerDBU Cost Value

* enabled remote getWorkspaceByDatabase (#754)

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* improved first run Impute clusterSpec (#759)

* excluded scope enhanced (#740)

* excluded scope enhanced

* review comment implemented

* modified lowerCase logic

---------

Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* adding temp location, start and end time in jobs runs list api (#755)

* adding temp location, start and end time in jobs runs list api

* change in jobRunsList api call

* removed default apiVersion from new apply method

* adding fix for jobs runs list api

* adding code to cleanse duplicate cols in JobRunsList transform, and added new bronze snapshots in target

* reading mount source from csv implemented (#695)

* reading mount source from csv implemented

* driver workspace should not call search/mount to get source

* review comment implemented

* review comment implemented

* Reading config from delta implemented. (#713)

* reading config from delta implemented.
skip mount point check for AWS added.

* review comment implemented

* review comment implemented

* review comment implemented

* review comment implemented

* review comment implemented

* merge conflict removed

* shuffle partition changed to String (#717)

* shuffle partition changed to String

* comments implemented

* comments implemented

* comments implemented

* comments implemented

* comments implemented

* comments implemented

* comments implemented

* test cases added

* test cases added

* adding generic api calls function (#756)

* adding generic api calls function

* adding an empty map as return in APIMeta Trait for def getAPIJsonQuery

* adding function getParallelAPIParams

* implemented code review comments

* removed commented lines

* one workspace instance per workspace deployment (#774)

* adding cluster type in jrcp view (#778)

* improved spark conf handler and optimized confs (#773)

* mount mapping validation added (#777)

* mount mapping validation added

* review comments implemented

* review comments implemented

* review comments implemented

* review comments implemented

* Integration Testing - Bug Fixes (#782)

* added persistAndLoad to all writes with tableLocking

* dont perform data validation if path validation fails -- protrects first run failures especially

* fix excludedScopes

* null config handlers plus proxy scope,key error handler

* added persistAndLoad to all writes with tableLocking

* dont perform data validation if path validation fails -- protrects first run failures especially

* fix excludedScopes

* null config handlers plus proxy scope,key error handler

* cleanup

* debugging

* fixed futures executions

* additional fixes

* dbu cost fix

* getOrgID bug fix

* target exists enhancement for delta target path validation

* getWorkspaceByDatabase -- cross-cloud remote workspace enabled

* added experimental flag to jrsnapshot and enabled manual module disabling

* rollback and module mapping

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>
Co-authored-by: Sourav Banerjee <109206082+souravbaner-da@users.noreply.github.com>
Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>
Co-authored-by: Sriram Mohanty <69749553+sriram251-code@users.noreply.github.com>
Co-authored-by: Aman <91308367+aman-db@users.noreply.github.com>
aman-db added a commit that referenced this pull request Mar 28, 2023
* initial commit

* Changing Serverless to High-Concurrency (#706)

* Changing Serverless to High-Concurrency

* minor changes

---------

Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>
Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* Spark events bronze par session (#678)

* initial commit

* Test Spark per session

* notes from meeting

* persession implemented

* persession implemented

* persession implemented

* persession implemented

* persession implemented

* persession implemented

* pr review implemented

* 686 - SparkEvents Executor ID Schema Handler

* global Session - initialize configs

* Concurrent Writes - Table Lock for Parallelized Loads (#691)

* initi commit -- working

* code modified preWriteActions added

* re-instanted perform retry for legacy deployments and added comments

* added logging details

* clear sessionsMap on batch runner

---------

Co-authored-by: sriram251-code <sriram.mohanty@databricks.com>

* minor fixes

---------

Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>
Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* adding new snapshots to bronze layer (#684)

* adding new snapshots to bronze layer

* changed all the single asset name to plural

* adding transform function for new bronze snaps

* changes applied to improve schema quality

---------

Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* apiURL "/" removed and dbsql added (#699)

* bug fix

* bug fix

* bug fix

* bug fix

* Improve acquisition of Cloud Provider and OrgID (#708)

* Improve acquisition of Cloud Provider and OrgID

* Improve acquisition of Cloud Provider and OrgID

* Modularize getOrgID function

* removed old commented version of code

---------

Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>
Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* Update SilverTransforms.scala (#703)

* Overwatch on photon broadcast exchange perf issue (#705)

* Change the Join to Shuffle_hash Join for collectEventLogPaths

* Change the Join to Shuffle_hash Join for collectEventLogPaths

* adding pagination logic for job-runs api (#723)

* 729 - enable clusterEvents merge Insert (#730)

* enable clusterEvents merge Insert

* added comments

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* auditRaw - mergeInsertOnly (#738)

* enable clusterEvents merge Insert

* added comments

* 737 - dateGlobFix and auditLogRaw mergeInserts

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* implemented (#752)

* asStrings implemented for apicallv2 (#707)

* 749 fill meta improved (#753)

* 749 fill meta improved

* put tsPartVal in clsf back to 16

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* cleaning jobs_snap_bronze new_cluster field (#732)

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* dbu and cost calculations fixed (#760)

* dbu calcs corrected

* readd aliases

* add runtime_engine to fillforward

* added a few comments to funcs

* corrected workerDBU Cost Value

* enabled remote getWorkspaceByDatabase (#754)

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* improved first run Impute clusterSpec (#759)

* excluded scope enhanced (#740)

* excluded scope enhanced

* review comment implemented

* modified lowerCase logic

---------

Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* adding temp location, start and end time in jobs runs list api (#755)

* adding temp location, start and end time in jobs runs list api

* change in jobRunsList api call

* removed default apiVersion from new apply method

* adding fix for jobs runs list api

* adding code to cleanse duplicate cols in JobRunsList transform, and added new bronze snapshots in target

* reading mount source from csv implemented (#695)

* reading mount source from csv implemented

* driver workspace should not call search/mount to get source

* review comment implemented

* review comment implemented

* Reading config from delta implemented. (#713)

* reading config from delta implemented.
skip mount point check for AWS added.

* review comment implemented

* review comment implemented

* review comment implemented

* review comment implemented

* review comment implemented

* merge conflict removed

* shuffle partition changed to String (#717)

* shuffle partition changed to String

* comments implemented

* comments implemented

* comments implemented

* comments implemented

* comments implemented

* comments implemented

* comments implemented

* test cases added

* test cases added

* adding generic api calls function (#756)

* adding generic api calls function

* adding an empty map as return in APIMeta Trait for def getAPIJsonQuery

* adding function getParallelAPIParams

* implemented code review comments

* removed commented lines

* one workspace instance per workspace deployment (#774)

* adding cluster type in jrcp view (#778)

* improved spark conf handler and optimized confs (#773)

* mount mapping validation added (#777)

* mount mapping validation added

* review comments implemented

* review comments implemented

* review comments implemented

* review comments implemented

* Integration Testing - Bug Fixes (#782)

* added persistAndLoad to all writes with tableLocking

* dont perform data validation if path validation fails -- protrects first run failures especially

* fix excludedScopes

* null config handlers plus proxy scope,key error handler

* added persistAndLoad to all writes with tableLocking

* dont perform data validation if path validation fails -- protrects first run failures especially

* fix excludedScopes

* null config handlers plus proxy scope,key error handler

* cleanup

* debugging

* fixed futures executions

* additional fixes

* dbu cost fix

* getOrgID bug fix

* target exists enhancement for delta target path validation

* getWorkspaceByDatabase -- cross-cloud remote workspace enabled

* added experimental flag to jrsnapshot and enabled manual module disabling

* rollback and module mapping

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>
Co-authored-by: Sourav Banerjee <109206082+souravbaner-da@users.noreply.github.com>
Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>
Co-authored-by: Sriram Mohanty <69749553+sriram251-code@users.noreply.github.com>
Co-authored-by: Aman <91308367+aman-db@users.noreply.github.com>
sudharshanraja-db pushed a commit that referenced this pull request Apr 11, 2023
* initial commit

* Changing Serverless to High-Concurrency (#706)

* Changing Serverless to High-Concurrency

* minor changes

---------

Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>
Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* Spark events bronze par session (#678)

* initial commit

* Test Spark per session

* notes from meeting

* persession implemented

* persession implemented

* persession implemented

* persession implemented

* persession implemented

* persession implemented

* pr review implemented

* 686 - SparkEvents Executor ID Schema Handler

* global Session - initialize configs

* Concurrent Writes - Table Lock for Parallelized Loads (#691)

* initi commit -- working

* code modified preWriteActions added

* re-instanted perform retry for legacy deployments and added comments

* added logging details

* clear sessionsMap on batch runner

---------

Co-authored-by: sriram251-code <sriram.mohanty@databricks.com>

* minor fixes

---------

Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>
Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* adding new snapshots to bronze layer (#684)

* adding new snapshots to bronze layer

* changed all the single asset name to plural

* adding transform function for new bronze snaps

* changes applied to improve schema quality

---------

Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* apiURL "/" removed and dbsql added (#699)

* bug fix

* bug fix

* bug fix

* bug fix

* Improve acquisition of Cloud Provider and OrgID (#708)

* Improve acquisition of Cloud Provider and OrgID

* Improve acquisition of Cloud Provider and OrgID

* Modularize getOrgID function

* removed old commented version of code

---------

Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>
Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* Update SilverTransforms.scala (#703)

* Overwatch on photon broadcast exchange perf issue (#705)

* Change the Join to Shuffle_hash Join for collectEventLogPaths

* Change the Join to Shuffle_hash Join for collectEventLogPaths

* adding pagination logic for job-runs api (#723)

* 729 - enable clusterEvents merge Insert (#730)

* enable clusterEvents merge Insert

* added comments

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* auditRaw - mergeInsertOnly (#738)

* enable clusterEvents merge Insert

* added comments

* 737 - dateGlobFix and auditLogRaw mergeInserts

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* implemented (#752)

* asStrings implemented for apicallv2 (#707)

* 749 fill meta improved (#753)

* 749 fill meta improved

* put tsPartVal in clsf back to 16

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* cleaning jobs_snap_bronze new_cluster field (#732)

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* dbu and cost calculations fixed (#760)

* dbu calcs corrected

* readd aliases

* add runtime_engine to fillforward

* added a few comments to funcs

* corrected workerDBU Cost Value

* enabled remote getWorkspaceByDatabase (#754)

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* improved first run Impute clusterSpec (#759)

* excluded scope enhanced (#740)

* excluded scope enhanced

* review comment implemented

* modified lowerCase logic

---------

Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* adding temp location, start and end time in jobs runs list api (#755)

* adding temp location, start and end time in jobs runs list api

* change in jobRunsList api call

* removed default apiVersion from new apply method

* adding fix for jobs runs list api

* adding code to cleanse duplicate cols in JobRunsList transform, and added new bronze snapshots in target

* reading mount source from csv implemented (#695)

* reading mount source from csv implemented

* driver workspace should not call search/mount to get source

* review comment implemented

* review comment implemented

* Reading config from delta implemented. (#713)

* reading config from delta implemented.
skip mount point check for AWS added.

* review comment implemented

* review comment implemented

* review comment implemented

* review comment implemented

* review comment implemented

* merge conflict removed

* shuffle partition changed to String (#717)

* shuffle partition changed to String

* comments implemented

* comments implemented

* comments implemented

* comments implemented

* comments implemented

* comments implemented

* comments implemented

* test cases added

* test cases added

* adding generic api calls function (#756)

* adding generic api calls function

* adding an empty map as return in APIMeta Trait for def getAPIJsonQuery

* adding function getParallelAPIParams

* implemented code review comments

* removed commented lines

* one workspace instance per workspace deployment (#774)

* adding cluster type in jrcp view (#778)

* improved spark conf handler and optimized confs (#773)

* mount mapping validation added (#777)

* mount mapping validation added

* review comments implemented

* review comments implemented

* review comments implemented

* review comments implemented

* Integration Testing - Bug Fixes (#782)

* added persistAndLoad to all writes with tableLocking

* dont perform data validation if path validation fails -- protrects first run failures especially

* fix excludedScopes

* null config handlers plus proxy scope,key error handler

* added persistAndLoad to all writes with tableLocking

* dont perform data validation if path validation fails -- protrects first run failures especially

* fix excludedScopes

* null config handlers plus proxy scope,key error handler

* cleanup

* debugging

* fixed futures executions

* additional fixes

* dbu cost fix

* getOrgID bug fix

* target exists enhancement for delta target path validation

* getWorkspaceByDatabase -- cross-cloud remote workspace enabled

* added experimental flag to jrsnapshot and enabled manual module disabling

* rollback and module mapping

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>
Co-authored-by: Sourav Banerjee <109206082+souravbaner-da@users.noreply.github.com>
Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>
Co-authored-by: Sriram Mohanty <69749553+sriram251-code@users.noreply.github.com>
Co-authored-by: Aman <91308367+aman-db@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Jobs_Snap_Bronze - new_cluster not cleansed
3 participants