Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark events bronze par session #678

Merged
merged 14 commits into from
Feb 3, 2023

Conversation

sriram251-code
Copy link
Contributor

@sriram251-code sriram251-code commented Jan 4, 2023

closes #686
closes #688

@sriram251-code sriram251-code added the enhancement New feature or request label Jan 4, 2023
@sriram251-code sriram251-code added this to the 0.7.1.1 milestone Jan 4, 2023
@sriram251-code sriram251-code self-assigned this Jan 4, 2023
Copy link
Contributor

@GeekSheikh GeekSheikh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget you need to update your build.sbt to be version 0.7.1.1 not 0.7.1.0

Comment on lines +387 to +388
//TODO clearcache will clear global cache multithread performance issue
// spark.catalog.clearCache()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REMINDER - We need to address this before merging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created #722

@GeekSheikh GeekSheikh linked an issue Jan 9, 2023 that may be closed by this pull request
GeekSheikh and others added 2 commits February 2, 2023 16:56
* initi commit -- working

* code modified preWriteActions added

* re-instanted perform retry for legacy deployments and added comments

* added logging details

* clear sessionsMap on batch runner

---------

Co-authored-by: sriram251-code <sriram.mohanty@databricks.com>
@sonarcloud
Copy link

sonarcloud bot commented Feb 2, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 5 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Copy link
Contributor

@GeekSheikh GeekSheikh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this will work, will do complete testing on the release branch.

@GeekSheikh GeekSheikh merged commit 3903d27 into 0711_release Feb 3, 2023
@GeekSheikh GeekSheikh deleted the sparkEventsBronze_ParSession branch February 3, 2023 14:37
GeekSheikh added a commit that referenced this pull request Feb 25, 2023
* initial commit

* Changing Serverless to High-Concurrency (#706)

* Changing Serverless to High-Concurrency

* minor changes

---------

Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>
Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* Spark events bronze par session (#678)

* initial commit

* Test Spark per session

* notes from meeting

* persession implemented

* persession implemented

* persession implemented

* persession implemented

* persession implemented

* persession implemented

* pr review implemented

* 686 - SparkEvents Executor ID Schema Handler

* global Session - initialize configs

* Concurrent Writes - Table Lock for Parallelized Loads (#691)

* initi commit -- working

* code modified preWriteActions added

* re-instanted perform retry for legacy deployments and added comments

* added logging details

* clear sessionsMap on batch runner

---------

Co-authored-by: sriram251-code <sriram.mohanty@databricks.com>

* minor fixes

---------

Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>
Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* adding new snapshots to bronze layer (#684)

* adding new snapshots to bronze layer

* changed all the single asset name to plural

* adding transform function for new bronze snaps

* changes applied to improve schema quality

---------

Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* apiURL "/" removed and dbsql added (#699)

* bug fix

* bug fix

* bug fix

* bug fix

* Improve acquisition of Cloud Provider and OrgID (#708)

* Improve acquisition of Cloud Provider and OrgID

* Improve acquisition of Cloud Provider and OrgID

* Modularize getOrgID function

* removed old commented version of code

---------

Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>
Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* Update SilverTransforms.scala (#703)

* Overwatch on photon broadcast exchange perf issue (#705)

* Change the Join to Shuffle_hash Join for collectEventLogPaths

* Change the Join to Shuffle_hash Join for collectEventLogPaths

* adding pagination logic for job-runs api (#723)

* 729 - enable clusterEvents merge Insert (#730)

* enable clusterEvents merge Insert

* added comments

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* auditRaw - mergeInsertOnly (#738)

* enable clusterEvents merge Insert

* added comments

* 737 - dateGlobFix and auditLogRaw mergeInserts

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* implemented (#752)

* asStrings implemented for apicallv2 (#707)

* 749 fill meta improved (#753)

* 749 fill meta improved

* put tsPartVal in clsf back to 16

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* cleaning jobs_snap_bronze new_cluster field (#732)

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* dbu and cost calculations fixed (#760)

* dbu calcs corrected

* readd aliases

* add runtime_engine to fillforward

* added a few comments to funcs

* corrected workerDBU Cost Value

* enabled remote getWorkspaceByDatabase (#754)

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* improved first run Impute clusterSpec (#759)

* excluded scope enhanced (#740)

* excluded scope enhanced

* review comment implemented

* modified lowerCase logic

---------

Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* adding temp location, start and end time in jobs runs list api (#755)

* adding temp location, start and end time in jobs runs list api

* change in jobRunsList api call

* removed default apiVersion from new apply method

* adding fix for jobs runs list api

* adding code to cleanse duplicate cols in JobRunsList transform, and added new bronze snapshots in target

* reading mount source from csv implemented (#695)

* reading mount source from csv implemented

* driver workspace should not call search/mount to get source

* review comment implemented

* review comment implemented

* Reading config from delta implemented. (#713)

* reading config from delta implemented.
skip mount point check for AWS added.

* review comment implemented

* review comment implemented

* review comment implemented

* review comment implemented

* review comment implemented

* merge conflict removed

* shuffle partition changed to String (#717)

* shuffle partition changed to String

* comments implemented

* comments implemented

* comments implemented

* comments implemented

* comments implemented

* comments implemented

* comments implemented

* test cases added

* test cases added

* adding generic api calls function (#756)

* adding generic api calls function

* adding an empty map as return in APIMeta Trait for def getAPIJsonQuery

* adding function getParallelAPIParams

* implemented code review comments

* removed commented lines

* one workspace instance per workspace deployment (#774)

* adding cluster type in jrcp view (#778)

* improved spark conf handler and optimized confs (#773)

* mount mapping validation added (#777)

* mount mapping validation added

* review comments implemented

* review comments implemented

* review comments implemented

* review comments implemented

* Integration Testing - Bug Fixes (#782)

* added persistAndLoad to all writes with tableLocking

* dont perform data validation if path validation fails -- protrects first run failures especially

* fix excludedScopes

* null config handlers plus proxy scope,key error handler

* added persistAndLoad to all writes with tableLocking

* dont perform data validation if path validation fails -- protrects first run failures especially

* fix excludedScopes

* null config handlers plus proxy scope,key error handler

* cleanup

* debugging

* fixed futures executions

* additional fixes

* dbu cost fix

* getOrgID bug fix

* target exists enhancement for delta target path validation

* getWorkspaceByDatabase -- cross-cloud remote workspace enabled

* added experimental flag to jrsnapshot and enabled manual module disabling

* rollback and module mapping

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>
Co-authored-by: Sourav Banerjee <109206082+souravbaner-da@users.noreply.github.com>
Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>
Co-authored-by: Sriram Mohanty <69749553+sriram251-code@users.noreply.github.com>
Co-authored-by: Aman <91308367+aman-db@users.noreply.github.com>
aman-db added a commit that referenced this pull request Mar 28, 2023
* initial commit

* Changing Serverless to High-Concurrency (#706)

* Changing Serverless to High-Concurrency

* minor changes

---------

Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>
Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* Spark events bronze par session (#678)

* initial commit

* Test Spark per session

* notes from meeting

* persession implemented

* persession implemented

* persession implemented

* persession implemented

* persession implemented

* persession implemented

* pr review implemented

* 686 - SparkEvents Executor ID Schema Handler

* global Session - initialize configs

* Concurrent Writes - Table Lock for Parallelized Loads (#691)

* initi commit -- working

* code modified preWriteActions added

* re-instanted perform retry for legacy deployments and added comments

* added logging details

* clear sessionsMap on batch runner

---------

Co-authored-by: sriram251-code <sriram.mohanty@databricks.com>

* minor fixes

---------

Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>
Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* adding new snapshots to bronze layer (#684)

* adding new snapshots to bronze layer

* changed all the single asset name to plural

* adding transform function for new bronze snaps

* changes applied to improve schema quality

---------

Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* apiURL "/" removed and dbsql added (#699)

* bug fix

* bug fix

* bug fix

* bug fix

* Improve acquisition of Cloud Provider and OrgID (#708)

* Improve acquisition of Cloud Provider and OrgID

* Improve acquisition of Cloud Provider and OrgID

* Modularize getOrgID function

* removed old commented version of code

---------

Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>
Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* Update SilverTransforms.scala (#703)

* Overwatch on photon broadcast exchange perf issue (#705)

* Change the Join to Shuffle_hash Join for collectEventLogPaths

* Change the Join to Shuffle_hash Join for collectEventLogPaths

* adding pagination logic for job-runs api (#723)

* 729 - enable clusterEvents merge Insert (#730)

* enable clusterEvents merge Insert

* added comments

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* auditRaw - mergeInsertOnly (#738)

* enable clusterEvents merge Insert

* added comments

* 737 - dateGlobFix and auditLogRaw mergeInserts

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* implemented (#752)

* asStrings implemented for apicallv2 (#707)

* 749 fill meta improved (#753)

* 749 fill meta improved

* put tsPartVal in clsf back to 16

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* cleaning jobs_snap_bronze new_cluster field (#732)

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* dbu and cost calculations fixed (#760)

* dbu calcs corrected

* readd aliases

* add runtime_engine to fillforward

* added a few comments to funcs

* corrected workerDBU Cost Value

* enabled remote getWorkspaceByDatabase (#754)

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* improved first run Impute clusterSpec (#759)

* excluded scope enhanced (#740)

* excluded scope enhanced

* review comment implemented

* modified lowerCase logic

---------

Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* adding temp location, start and end time in jobs runs list api (#755)

* adding temp location, start and end time in jobs runs list api

* change in jobRunsList api call

* removed default apiVersion from new apply method

* adding fix for jobs runs list api

* adding code to cleanse duplicate cols in JobRunsList transform, and added new bronze snapshots in target

* reading mount source from csv implemented (#695)

* reading mount source from csv implemented

* driver workspace should not call search/mount to get source

* review comment implemented

* review comment implemented

* Reading config from delta implemented. (#713)

* reading config from delta implemented.
skip mount point check for AWS added.

* review comment implemented

* review comment implemented

* review comment implemented

* review comment implemented

* review comment implemented

* merge conflict removed

* shuffle partition changed to String (#717)

* shuffle partition changed to String

* comments implemented

* comments implemented

* comments implemented

* comments implemented

* comments implemented

* comments implemented

* comments implemented

* test cases added

* test cases added

* adding generic api calls function (#756)

* adding generic api calls function

* adding an empty map as return in APIMeta Trait for def getAPIJsonQuery

* adding function getParallelAPIParams

* implemented code review comments

* removed commented lines

* one workspace instance per workspace deployment (#774)

* adding cluster type in jrcp view (#778)

* improved spark conf handler and optimized confs (#773)

* mount mapping validation added (#777)

* mount mapping validation added

* review comments implemented

* review comments implemented

* review comments implemented

* review comments implemented

* Integration Testing - Bug Fixes (#782)

* added persistAndLoad to all writes with tableLocking

* dont perform data validation if path validation fails -- protrects first run failures especially

* fix excludedScopes

* null config handlers plus proxy scope,key error handler

* added persistAndLoad to all writes with tableLocking

* dont perform data validation if path validation fails -- protrects first run failures especially

* fix excludedScopes

* null config handlers plus proxy scope,key error handler

* cleanup

* debugging

* fixed futures executions

* additional fixes

* dbu cost fix

* getOrgID bug fix

* target exists enhancement for delta target path validation

* getWorkspaceByDatabase -- cross-cloud remote workspace enabled

* added experimental flag to jrsnapshot and enabled manual module disabling

* rollback and module mapping

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>
Co-authored-by: Sourav Banerjee <109206082+souravbaner-da@users.noreply.github.com>
Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>
Co-authored-by: Sriram Mohanty <69749553+sriram251-code@users.noreply.github.com>
Co-authored-by: Aman <91308367+aman-db@users.noreply.github.com>
sudharshanraja-db pushed a commit that referenced this pull request Apr 11, 2023
* initial commit

* Changing Serverless to High-Concurrency (#706)

* Changing Serverless to High-Concurrency

* minor changes

---------

Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>
Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* Spark events bronze par session (#678)

* initial commit

* Test Spark per session

* notes from meeting

* persession implemented

* persession implemented

* persession implemented

* persession implemented

* persession implemented

* persession implemented

* pr review implemented

* 686 - SparkEvents Executor ID Schema Handler

* global Session - initialize configs

* Concurrent Writes - Table Lock for Parallelized Loads (#691)

* initi commit -- working

* code modified preWriteActions added

* re-instanted perform retry for legacy deployments and added comments

* added logging details

* clear sessionsMap on batch runner

---------

Co-authored-by: sriram251-code <sriram.mohanty@databricks.com>

* minor fixes

---------

Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>
Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* adding new snapshots to bronze layer (#684)

* adding new snapshots to bronze layer

* changed all the single asset name to plural

* adding transform function for new bronze snaps

* changes applied to improve schema quality

---------

Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* apiURL "/" removed and dbsql added (#699)

* bug fix

* bug fix

* bug fix

* bug fix

* Improve acquisition of Cloud Provider and OrgID (#708)

* Improve acquisition of Cloud Provider and OrgID

* Improve acquisition of Cloud Provider and OrgID

* Modularize getOrgID function

* removed old commented version of code

---------

Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>
Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* Update SilverTransforms.scala (#703)

* Overwatch on photon broadcast exchange perf issue (#705)

* Change the Join to Shuffle_hash Join for collectEventLogPaths

* Change the Join to Shuffle_hash Join for collectEventLogPaths

* adding pagination logic for job-runs api (#723)

* 729 - enable clusterEvents merge Insert (#730)

* enable clusterEvents merge Insert

* added comments

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* auditRaw - mergeInsertOnly (#738)

* enable clusterEvents merge Insert

* added comments

* 737 - dateGlobFix and auditLogRaw mergeInserts

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* implemented (#752)

* asStrings implemented for apicallv2 (#707)

* 749 fill meta improved (#753)

* 749 fill meta improved

* put tsPartVal in clsf back to 16

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* cleaning jobs_snap_bronze new_cluster field (#732)

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* dbu and cost calculations fixed (#760)

* dbu calcs corrected

* readd aliases

* add runtime_engine to fillforward

* added a few comments to funcs

* corrected workerDBU Cost Value

* enabled remote getWorkspaceByDatabase (#754)

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

* improved first run Impute clusterSpec (#759)

* excluded scope enhanced (#740)

* excluded scope enhanced

* review comment implemented

* modified lowerCase logic

---------

Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com>

* adding temp location, start and end time in jobs runs list api (#755)

* adding temp location, start and end time in jobs runs list api

* change in jobRunsList api call

* removed default apiVersion from new apply method

* adding fix for jobs runs list api

* adding code to cleanse duplicate cols in JobRunsList transform, and added new bronze snapshots in target

* reading mount source from csv implemented (#695)

* reading mount source from csv implemented

* driver workspace should not call search/mount to get source

* review comment implemented

* review comment implemented

* Reading config from delta implemented. (#713)

* reading config from delta implemented.
skip mount point check for AWS added.

* review comment implemented

* review comment implemented

* review comment implemented

* review comment implemented

* review comment implemented

* merge conflict removed

* shuffle partition changed to String (#717)

* shuffle partition changed to String

* comments implemented

* comments implemented

* comments implemented

* comments implemented

* comments implemented

* comments implemented

* comments implemented

* test cases added

* test cases added

* adding generic api calls function (#756)

* adding generic api calls function

* adding an empty map as return in APIMeta Trait for def getAPIJsonQuery

* adding function getParallelAPIParams

* implemented code review comments

* removed commented lines

* one workspace instance per workspace deployment (#774)

* adding cluster type in jrcp view (#778)

* improved spark conf handler and optimized confs (#773)

* mount mapping validation added (#777)

* mount mapping validation added

* review comments implemented

* review comments implemented

* review comments implemented

* review comments implemented

* Integration Testing - Bug Fixes (#782)

* added persistAndLoad to all writes with tableLocking

* dont perform data validation if path validation fails -- protrects first run failures especially

* fix excludedScopes

* null config handlers plus proxy scope,key error handler

* added persistAndLoad to all writes with tableLocking

* dont perform data validation if path validation fails -- protrects first run failures especially

* fix excludedScopes

* null config handlers plus proxy scope,key error handler

* cleanup

* debugging

* fixed futures executions

* additional fixes

* dbu cost fix

* getOrgID bug fix

* target exists enhancement for delta target path validation

* getWorkspaceByDatabase -- cross-cloud remote workspace enabled

* added experimental flag to jrsnapshot and enabled manual module disabling

* rollback and module mapping

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>

---------

Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>
Co-authored-by: Sourav Banerjee <109206082+souravbaner-da@users.noreply.github.com>
Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com>
Co-authored-by: Sriram Mohanty <69749553+sriram251-code@users.noreply.github.com>
Co-authored-by: Aman <91308367+aman-db@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] - SparkEventsBronze - Schema Exception
2 participants