-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent Writes - Table Lock for Parallelized Loads #691
Merged
GeekSheikh
merged 5 commits into
sparkEventsBronze_ParSession
from
690-concWrites_With_Lock
Feb 2, 2023
Merged
Concurrent Writes - Table Lock for Parallelized Loads #691
GeekSheikh
merged 5 commits into
sparkEventsBronze_ParSession
from
690-concWrites_With_Lock
Feb 2, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
GeekSheikh
changed the title
initi commit -- working
Concurrent Writes - Table Lock for Parallelized Loads
Jan 12, 2023
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
GeekSheikh
added a commit
that referenced
this pull request
Feb 3, 2023
* initial commit * Test Spark per session * notes from meeting * persession implemented * persession implemented * persession implemented * persession implemented * persession implemented * persession implemented * pr review implemented * 686 - SparkEvents Executor ID Schema Handler * global Session - initialize configs * Concurrent Writes - Table Lock for Parallelized Loads (#691) * initi commit -- working * code modified preWriteActions added * re-instanted perform retry for legacy deployments and added comments * added logging details * clear sessionsMap on batch runner --------- Co-authored-by: sriram251-code <sriram.mohanty@databricks.com> * minor fixes --------- Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com> Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com>
GeekSheikh
added a commit
that referenced
this pull request
Feb 25, 2023
* initial commit * Changing Serverless to High-Concurrency (#706) * Changing Serverless to High-Concurrency * minor changes --------- Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com> Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com> * Spark events bronze par session (#678) * initial commit * Test Spark per session * notes from meeting * persession implemented * persession implemented * persession implemented * persession implemented * persession implemented * persession implemented * pr review implemented * 686 - SparkEvents Executor ID Schema Handler * global Session - initialize configs * Concurrent Writes - Table Lock for Parallelized Loads (#691) * initi commit -- working * code modified preWriteActions added * re-instanted perform retry for legacy deployments and added comments * added logging details * clear sessionsMap on batch runner --------- Co-authored-by: sriram251-code <sriram.mohanty@databricks.com> * minor fixes --------- Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com> Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> * adding new snapshots to bronze layer (#684) * adding new snapshots to bronze layer * changed all the single asset name to plural * adding transform function for new bronze snaps * changes applied to improve schema quality --------- Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com> * apiURL "/" removed and dbsql added (#699) * bug fix * bug fix * bug fix * bug fix * Improve acquisition of Cloud Provider and OrgID (#708) * Improve acquisition of Cloud Provider and OrgID * Improve acquisition of Cloud Provider and OrgID * Modularize getOrgID function * removed old commented version of code --------- Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com> Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com> * Update SilverTransforms.scala (#703) * Overwatch on photon broadcast exchange perf issue (#705) * Change the Join to Shuffle_hash Join for collectEventLogPaths * Change the Join to Shuffle_hash Join for collectEventLogPaths * adding pagination logic for job-runs api (#723) * 729 - enable clusterEvents merge Insert (#730) * enable clusterEvents merge Insert * added comments --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> * auditRaw - mergeInsertOnly (#738) * enable clusterEvents merge Insert * added comments * 737 - dateGlobFix and auditLogRaw mergeInserts --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> * implemented (#752) * asStrings implemented for apicallv2 (#707) * 749 fill meta improved (#753) * 749 fill meta improved * put tsPartVal in clsf back to 16 --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> * cleaning jobs_snap_bronze new_cluster field (#732) Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> * dbu and cost calculations fixed (#760) * dbu calcs corrected * readd aliases * add runtime_engine to fillforward * added a few comments to funcs * corrected workerDBU Cost Value * enabled remote getWorkspaceByDatabase (#754) Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> * improved first run Impute clusterSpec (#759) * excluded scope enhanced (#740) * excluded scope enhanced * review comment implemented * modified lowerCase logic --------- Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com> * adding temp location, start and end time in jobs runs list api (#755) * adding temp location, start and end time in jobs runs list api * change in jobRunsList api call * removed default apiVersion from new apply method * adding fix for jobs runs list api * adding code to cleanse duplicate cols in JobRunsList transform, and added new bronze snapshots in target * reading mount source from csv implemented (#695) * reading mount source from csv implemented * driver workspace should not call search/mount to get source * review comment implemented * review comment implemented * Reading config from delta implemented. (#713) * reading config from delta implemented. skip mount point check for AWS added. * review comment implemented * review comment implemented * review comment implemented * review comment implemented * review comment implemented * merge conflict removed * shuffle partition changed to String (#717) * shuffle partition changed to String * comments implemented * comments implemented * comments implemented * comments implemented * comments implemented * comments implemented * comments implemented * test cases added * test cases added * adding generic api calls function (#756) * adding generic api calls function * adding an empty map as return in APIMeta Trait for def getAPIJsonQuery * adding function getParallelAPIParams * implemented code review comments * removed commented lines * one workspace instance per workspace deployment (#774) * adding cluster type in jrcp view (#778) * improved spark conf handler and optimized confs (#773) * mount mapping validation added (#777) * mount mapping validation added * review comments implemented * review comments implemented * review comments implemented * review comments implemented * Integration Testing - Bug Fixes (#782) * added persistAndLoad to all writes with tableLocking * dont perform data validation if path validation fails -- protrects first run failures especially * fix excludedScopes * null config handlers plus proxy scope,key error handler * added persistAndLoad to all writes with tableLocking * dont perform data validation if path validation fails -- protrects first run failures especially * fix excludedScopes * null config handlers plus proxy scope,key error handler * cleanup * debugging * fixed futures executions * additional fixes * dbu cost fix * getOrgID bug fix * target exists enhancement for delta target path validation * getWorkspaceByDatabase -- cross-cloud remote workspace enabled * added experimental flag to jrsnapshot and enabled manual module disabling * rollback and module mapping --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> Co-authored-by: Sourav Banerjee <109206082+souravbaner-da@users.noreply.github.com> Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com> Co-authored-by: Sriram Mohanty <69749553+sriram251-code@users.noreply.github.com> Co-authored-by: Aman <91308367+aman-db@users.noreply.github.com>
aman-db
added a commit
that referenced
this pull request
Mar 28, 2023
* initial commit * Changing Serverless to High-Concurrency (#706) * Changing Serverless to High-Concurrency * minor changes --------- Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com> Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com> * Spark events bronze par session (#678) * initial commit * Test Spark per session * notes from meeting * persession implemented * persession implemented * persession implemented * persession implemented * persession implemented * persession implemented * pr review implemented * 686 - SparkEvents Executor ID Schema Handler * global Session - initialize configs * Concurrent Writes - Table Lock for Parallelized Loads (#691) * initi commit -- working * code modified preWriteActions added * re-instanted perform retry for legacy deployments and added comments * added logging details * clear sessionsMap on batch runner --------- Co-authored-by: sriram251-code <sriram.mohanty@databricks.com> * minor fixes --------- Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com> Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> * adding new snapshots to bronze layer (#684) * adding new snapshots to bronze layer * changed all the single asset name to plural * adding transform function for new bronze snaps * changes applied to improve schema quality --------- Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com> * apiURL "/" removed and dbsql added (#699) * bug fix * bug fix * bug fix * bug fix * Improve acquisition of Cloud Provider and OrgID (#708) * Improve acquisition of Cloud Provider and OrgID * Improve acquisition of Cloud Provider and OrgID * Modularize getOrgID function * removed old commented version of code --------- Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com> Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com> * Update SilverTransforms.scala (#703) * Overwatch on photon broadcast exchange perf issue (#705) * Change the Join to Shuffle_hash Join for collectEventLogPaths * Change the Join to Shuffle_hash Join for collectEventLogPaths * adding pagination logic for job-runs api (#723) * 729 - enable clusterEvents merge Insert (#730) * enable clusterEvents merge Insert * added comments --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> * auditRaw - mergeInsertOnly (#738) * enable clusterEvents merge Insert * added comments * 737 - dateGlobFix and auditLogRaw mergeInserts --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> * implemented (#752) * asStrings implemented for apicallv2 (#707) * 749 fill meta improved (#753) * 749 fill meta improved * put tsPartVal in clsf back to 16 --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> * cleaning jobs_snap_bronze new_cluster field (#732) Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> * dbu and cost calculations fixed (#760) * dbu calcs corrected * readd aliases * add runtime_engine to fillforward * added a few comments to funcs * corrected workerDBU Cost Value * enabled remote getWorkspaceByDatabase (#754) Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> * improved first run Impute clusterSpec (#759) * excluded scope enhanced (#740) * excluded scope enhanced * review comment implemented * modified lowerCase logic --------- Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com> * adding temp location, start and end time in jobs runs list api (#755) * adding temp location, start and end time in jobs runs list api * change in jobRunsList api call * removed default apiVersion from new apply method * adding fix for jobs runs list api * adding code to cleanse duplicate cols in JobRunsList transform, and added new bronze snapshots in target * reading mount source from csv implemented (#695) * reading mount source from csv implemented * driver workspace should not call search/mount to get source * review comment implemented * review comment implemented * Reading config from delta implemented. (#713) * reading config from delta implemented. skip mount point check for AWS added. * review comment implemented * review comment implemented * review comment implemented * review comment implemented * review comment implemented * merge conflict removed * shuffle partition changed to String (#717) * shuffle partition changed to String * comments implemented * comments implemented * comments implemented * comments implemented * comments implemented * comments implemented * comments implemented * test cases added * test cases added * adding generic api calls function (#756) * adding generic api calls function * adding an empty map as return in APIMeta Trait for def getAPIJsonQuery * adding function getParallelAPIParams * implemented code review comments * removed commented lines * one workspace instance per workspace deployment (#774) * adding cluster type in jrcp view (#778) * improved spark conf handler and optimized confs (#773) * mount mapping validation added (#777) * mount mapping validation added * review comments implemented * review comments implemented * review comments implemented * review comments implemented * Integration Testing - Bug Fixes (#782) * added persistAndLoad to all writes with tableLocking * dont perform data validation if path validation fails -- protrects first run failures especially * fix excludedScopes * null config handlers plus proxy scope,key error handler * added persistAndLoad to all writes with tableLocking * dont perform data validation if path validation fails -- protrects first run failures especially * fix excludedScopes * null config handlers plus proxy scope,key error handler * cleanup * debugging * fixed futures executions * additional fixes * dbu cost fix * getOrgID bug fix * target exists enhancement for delta target path validation * getWorkspaceByDatabase -- cross-cloud remote workspace enabled * added experimental flag to jrsnapshot and enabled manual module disabling * rollback and module mapping --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> Co-authored-by: Sourav Banerjee <109206082+souravbaner-da@users.noreply.github.com> Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com> Co-authored-by: Sriram Mohanty <69749553+sriram251-code@users.noreply.github.com> Co-authored-by: Aman <91308367+aman-db@users.noreply.github.com>
sudharshanraja-db
pushed a commit
that referenced
this pull request
Apr 11, 2023
* initial commit * Changing Serverless to High-Concurrency (#706) * Changing Serverless to High-Concurrency * minor changes --------- Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com> Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com> * Spark events bronze par session (#678) * initial commit * Test Spark per session * notes from meeting * persession implemented * persession implemented * persession implemented * persession implemented * persession implemented * persession implemented * pr review implemented * 686 - SparkEvents Executor ID Schema Handler * global Session - initialize configs * Concurrent Writes - Table Lock for Parallelized Loads (#691) * initi commit -- working * code modified preWriteActions added * re-instanted perform retry for legacy deployments and added comments * added logging details * clear sessionsMap on batch runner --------- Co-authored-by: sriram251-code <sriram.mohanty@databricks.com> * minor fixes --------- Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com> Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> * adding new snapshots to bronze layer (#684) * adding new snapshots to bronze layer * changed all the single asset name to plural * adding transform function for new bronze snaps * changes applied to improve schema quality --------- Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com> * apiURL "/" removed and dbsql added (#699) * bug fix * bug fix * bug fix * bug fix * Improve acquisition of Cloud Provider and OrgID (#708) * Improve acquisition of Cloud Provider and OrgID * Improve acquisition of Cloud Provider and OrgID * Modularize getOrgID function * removed old commented version of code --------- Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com> Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com> * Update SilverTransforms.scala (#703) * Overwatch on photon broadcast exchange perf issue (#705) * Change the Join to Shuffle_hash Join for collectEventLogPaths * Change the Join to Shuffle_hash Join for collectEventLogPaths * adding pagination logic for job-runs api (#723) * 729 - enable clusterEvents merge Insert (#730) * enable clusterEvents merge Insert * added comments --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> * auditRaw - mergeInsertOnly (#738) * enable clusterEvents merge Insert * added comments * 737 - dateGlobFix and auditLogRaw mergeInserts --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> * implemented (#752) * asStrings implemented for apicallv2 (#707) * 749 fill meta improved (#753) * 749 fill meta improved * put tsPartVal in clsf back to 16 --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> * cleaning jobs_snap_bronze new_cluster field (#732) Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> * dbu and cost calculations fixed (#760) * dbu calcs corrected * readd aliases * add runtime_engine to fillforward * added a few comments to funcs * corrected workerDBU Cost Value * enabled remote getWorkspaceByDatabase (#754) Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> * improved first run Impute clusterSpec (#759) * excluded scope enhanced (#740) * excluded scope enhanced * review comment implemented * modified lowerCase logic --------- Co-authored-by: Daniel Tomes <10840635+GeekSheikh@users.noreply.github.com> * adding temp location, start and end time in jobs runs list api (#755) * adding temp location, start and end time in jobs runs list api * change in jobRunsList api call * removed default apiVersion from new apply method * adding fix for jobs runs list api * adding code to cleanse duplicate cols in JobRunsList transform, and added new bronze snapshots in target * reading mount source from csv implemented (#695) * reading mount source from csv implemented * driver workspace should not call search/mount to get source * review comment implemented * review comment implemented * Reading config from delta implemented. (#713) * reading config from delta implemented. skip mount point check for AWS added. * review comment implemented * review comment implemented * review comment implemented * review comment implemented * review comment implemented * merge conflict removed * shuffle partition changed to String (#717) * shuffle partition changed to String * comments implemented * comments implemented * comments implemented * comments implemented * comments implemented * comments implemented * comments implemented * test cases added * test cases added * adding generic api calls function (#756) * adding generic api calls function * adding an empty map as return in APIMeta Trait for def getAPIJsonQuery * adding function getParallelAPIParams * implemented code review comments * removed commented lines * one workspace instance per workspace deployment (#774) * adding cluster type in jrcp view (#778) * improved spark conf handler and optimized confs (#773) * mount mapping validation added (#777) * mount mapping validation added * review comments implemented * review comments implemented * review comments implemented * review comments implemented * Integration Testing - Bug Fixes (#782) * added persistAndLoad to all writes with tableLocking * dont perform data validation if path validation fails -- protrects first run failures especially * fix excludedScopes * null config handlers plus proxy scope,key error handler * added persistAndLoad to all writes with tableLocking * dont perform data validation if path validation fails -- protrects first run failures especially * fix excludedScopes * null config handlers plus proxy scope,key error handler * cleanup * debugging * fixed futures executions * additional fixes * dbu cost fix * getOrgID bug fix * target exists enhancement for delta target path validation * getWorkspaceByDatabase -- cross-cloud remote workspace enabled * added experimental flag to jrsnapshot and enabled manual module disabling * rollback and module mapping --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> --------- Co-authored-by: geeksheikh <geeksheikh@users.noreply.github.com> Co-authored-by: Sourav Banerjee <109206082+souravbaner-da@users.noreply.github.com> Co-authored-by: Sourav Banerjee <30810740+Sourav692@users.noreply.github.com> Co-authored-by: Sriram Mohanty <69749553+sriram251-code@users.noreply.github.com> Co-authored-by: Aman <91308367+aman-db@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
closes #690
@Sriram-databricks there are some todos in Database.scala.
Please review. This was just a prototype, please closely review, and rip out all the old retry code that's been commented out for now.
Also, please do some local tests and validations. We'll need to add some comments but I needed to get this working for a customer.