-
Notifications
You must be signed in to change notification settings - Fork 28
Linking Multiple Instances
Schedoscope provides functionality to use views mantained by a different Schedoscope instance. This enables you run two (or more) instances independently where the views of one Schedoscope instance depend on views managed by a different instance. Such dependencies are called external dependencies; a view at the end of an external dependency is called external view.
Multiple Schedoscope instances don't talk directly with each other via a network protocol. They also don't have to be available at the same time; hence, there is need for buffering of communication. They only synchronize via the shared Hive metastore.
There is no setup or configuration necessary on the foreign schedoscope instance. You only have to adapt the instance that wants to access external views.
To build on top of views of another Schedoscope instance, you first have to include the foreign schedoscope instance's view definitions in your classpath/dependencies. This can be done by simply using maven or the build tool of your liking. This is necessary in order to be able to reference external views.
This section has be included in your schedoscope.conf.
external-dependencies {
#
# This setting allows you to use external dependencies and operate several schedoscope instances in conjunction.
#
enabled = true
#
# A list of prefixes of packages with internal views. Every package not starting with a string in this list
# will be treated as external and can not be referenced from the client or used as dependency if not flagged as
# external
#
home = ["${env}.datahub", "${env}.datamart"]
#
# Toggles checks whether internal views are used as external views and vice versa
#
checks = true
}
You have to include all internal packages/views in the home list. Views that are declared outside of these packages can only be used as external dependencies. Vice versa, views inside these package cannot be used as external dependencies. As long as you have the checks enabled Schedoscope will notify you if you violate these rules.
case class Products(shopCode: Parameter[String],
year: Parameter[String],
month: Parameter[String],
day: Parameter[String]) extends View
with PointOccurrence
with JobMetadata
with DailyParameterization {
val shop = dependsOn(() => external(ExternalShop(shopCode)))
val productId = fieldOf[String]
val productName = fieldOf[String]
transformVia(() =>
HiveTransformation(insertInto(
this,
s"""SELECT * FROM ${shop().n}""")))
}
The process of materializing a view with external dependencies does not change. Upon receiving a materialize command the view will simply check the Metastore if the external dependency has changed.
If you want to trigger a linked Schedoscope instance as soon as the topmost views in your Schedoscope instance have been materialized, you can do this by introducing a _toplevel _view. The toplevel view has dependencies to all views at the end your data pipeline. At the toplevel view you can register a shell transformation which triggers the linked Schedoscope instance, e.g., via a curl command.
The operation of linked Schedoscope instances is not maintenance free. If you're doing drastic changes in one of your Schedoscope instances, all linked instances might be affected as well. For example, resetting checksums and timestamps in a big fashion requires synchronization.