diff --git a/CHANGELOG.md b/CHANGELOG.md index 761bc18ae58..8dee57f56ea 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -56,6 +56,7 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) ### 📝 Documentation +* [MD] Add design documents of multiple data source feature [#2538](https://github.com/opensearch-project/OpenSearch-Dashboards/pull/2538) ### 🛠 Maintenance - Adding @zhongnansu as maintainer. ([#2590](https://github.com/opensearch-project/OpenSearch-Dashboards/pull/2590)) diff --git a/docs/multi-datasource/client_management_design.md b/docs/multi-datasource/client_management_design.md new file mode 100644 index 00000000000..389e2e40825 --- /dev/null +++ b/docs/multi-datasource/client_management_design.md @@ -0,0 +1,226 @@ +# Multi Data Source Client Management + +## 1. Problem Statement + +This design is part of the OpenSearch Dashboards multi data source project [[RFC](https://github.com/opensearch-project/OpenSearch-Dashboards/issues/1388)], where we need to manage and expose datasource clients. Connections are established by creating clients that can then be used by a caller to interact with any data source (OpenSearch is the only data source type in scope at this phase). + +**Overall the critical problems we are solving are:** + +1. How to set up connection(clients) for different data sources? +2. How to expose data source clients to callers through clean interfaces? +3. How to maintain backwards compatibility if user turn off this feature? +4. How to manage multiple clients/connection efficiently, and not consume all the memory? + +## 2. Requirements + +1. **Accessibility**: + 1. Clients need to be accessible by other OpenSearch Dashboards plugins or modules through interfaces, in all stages of the plugin lifecycle. E.g “Setup”, and “Start” + 2. Clients should be accessible by plugin through request handler context. +2. **Client Management**: Clients needs to be reused in a resource-efficient way to not harm the performance. +3. **Backwards compatibility**: if user enables this feature and later disabled it. Any related logic should be able to take in this config change, and deal with any user cases. + 1. Either switching to connect to default OpenSearch cluster + 2. Or blocking the connection to data source, and throw error message +4. **Auditing:** Need to log different user query on different data sources, for troubleshooting, or log analysis + +## 3. Architecture/Dataflow + +- We are adding a new service in core to manage data source clients, and expose interface for plugins and modules to access data source client. +- Existing OpenSearch services and saved object services should not be affected by this change + +#### 3.1 Dataflow of plugin(use viz plugin as example) call sequence to retrieve data form any datasource. + +![img](./img/client_management_dataflow.png) + +#### 3.2 Architecture Diagram + +![img](./img/client_management_architecture.png) + +## 4. Detailed Design + +### 4.0 Answer some critical design questions + +**1.** **How to set up connection(clients) for different datasources?** +Similar to how current OpenSearch Dashboards talks to default OS by creating opensearch node.js client using [opensearch-js](https://github.com/opensearch-project/opensearch-js) library, for datasources we also create clients for each. Critical params that differentiate data sources are `url` and `auth` + +```ts +const { Client } = require('@opensearch-project/opensearch'); + +const dataSourceClient = new Client({ + node: url, + auth: { + username, + password, + }, + ...OtherClientOptions, +}); + +dataSourceClient.search(); +dataSourceClient.ping(); +``` + +**2. How to expose datasource clients to callers through clean interfaces?** +We create a `data source service`. Similar to existing `opensearch service` in core, which provides client of default OS cluster. This new service will be dedicated to provide clients for data sources. Following the same paradigm we can register this new service to `CoreStart`, `CoreRouteHandlerContext` , in order to expose data source client to plugins and modules. The interface is exposed from new service, and thus it doesn’t mess up with any existing services, and keeps the interface clean. + +``` +*// Existing* +*const defaultClient: OpenSearchClient = core.opensearch.client.asCurrentUser +* +// With openearch_data_services added +const dataSourceClient: OpenSearchClient = core.openearchData.client +``` + +**3.How to maintain backwards compatibility if user turns off this feature?** +The context is that user can only turn on/off multiple datasource feature by updating boolean config `data_source.enabled` in `opensearch_dashboards.yml` and reboot. + +1. **Browser side**, if datasource feature is turned off, browser should detect the config change and update UI not allowing request to be submitted to any datasource. Multiple datasource related UI shouldn't render. If the request is not submitted to a datasource, the logic won’t return a datasource client at all. +2. **Server side**, if user submits the request to datasource manually, on purpose. Or the plugin tries to access datasource client from server side. In the corresponding core service we’ll have a **flag** that maps to the **enable_multi_datasource** boolean config, and throw error if API is called while this feature is turned off. + +**4.How to manage multiple clients/connection efficiently, and not consume all the memory?** + +- For datasources with different endpoint, user client Pooling (E.g. LRU cache) +- For data sources with same endpoint, but different user, use connection pooling strategy (child client) provided by opensearch-js. + +**5.Where should we implement the core logic?** +Current `opensearch service` exists in core. The module we'll implement has similarity function wise, but we choose to implement `data source service` in plugin along with `crypto` service for the following reasons. + +1. Data source is a feature that can be turned on or off. Plugin is born for such plugable use case. +2. We don't mess up with OpenSearch Dashboards core, since this is an experimental feature, the potential risk of breaking existing behavior will be lower if we use plugin. Worst case, user could just uninstall the plugin. +3. Complexity wise, it's about the same amount of work. + +### 4.1 Data Source Plugin + +Create a data source plugin that only has server side code, to hold most core logic of data source feature. Including data service, crypto service, and client management. A plugin will have all setup, start and stop as lifecycle. + +**Functionality** + +- Setup plugin configuration such as `data_source.enabled` +- Define and register datasource as a new saved object type +- Initiate data source service and crypto service +- Register API to get datasource client to core route handler context +- Setup logging and auditing +- Stop all running services in plugin `stop()` phase + +### 4.1 Data Source Service + +We need to create a data source service in the data source plugin, to provide the main functionality and APIs for callers to `getDataSourceClient()`. A service in a plugin will have all setup, start and stop as lifecycle. + +**Functionality** + +- Initialize client pool as empty data structure but with size mapped to user config value. (`data_source.clientPool.size`) +- Configuring a data source client and expose as `getDataSourceClient()` from service level. + +### 4.2. Data source client + +We need to configure the data source client by either creating a new one, or looking up the client pool. + +**Functionality** + +- Get data source meta info: Use saved object client to retrieve data source info from OpenSearch Dashboards system index by id, and parse results to `DataSource` object. + + ```ts + { + title: ds-sample; + description?: data source; + endpoint: http://opensearch.com; + auth: { + type: "Basic Auth" + username: "user name" + password: "encrypted content" + }; + } + ``` + +- Get root client: Look up client Pool by **endpoint**, return client if existed. If misses, we create new client instance and load into pool. At this step, the client won't have any auth info. + +- Get credentials: Call crypto service utilities to **decrypt** user credentials from `DataSource` Object. +- Assemble the actual query client: With auth info and root client, we’ll leverage the openearch-js connection pooling strategy to create the actual query client from root client by `client.child()`. + +#### 4.2.1 Legacy Client + +OpenSearch Dashboards is forked from Kibana 7.10. At the time of the fork happened, there are 2 types of client used in the codebase. One is the new client, which later was migrated as `opensearhc-js`, the other one is the legacy client which is `elasticsearc-js`. Legacy clients are still used many critical features, such as visualization, index pattern management, along with new client. + +```ts +// legacy client +context.core.opensearch.legacy.client.callAsCurrentUser; +// new client +context.core.opensearch.client.asCurrentUser; +``` + +Since deprecating legacy client could be a bigger scope of project, multiple data source feature still need to implement a substitute for it as for now. Implementation should be done in a way that's decoupled with data source client as much as possible, for easier deprecation. Similar to [opensearch legacy service](https://github.com/opensearch-project/OpenSearch-Dashboards/tree/main/src/core/server/opensearch/legacy) in core. + +```ts +context.dataSource.opensearch.legacy.getClient(dataSourceId); +``` + +### 4.3 Register datasource client to core context + +This is for plugin to access data source client via request handler. For example, by `core.client.search(params)`. It’s a very common use case for plugin to access cluster while handling request. In fact data plugin uses it in its search module to get client, and I’ll talk about it in details in next section. + +- **param** + - **dataSourceId**: need it to retrieve **datasource info** for either creating new client, or look up the client pool +- **return type:** OpenSearchClient + ```ts + core.http.registerRouteHandlerContext( + 'dataSource', + { + opensearch: { + getClient: (dataSourceId: string) = { + ... + return dataSourceService.getDataSourceClient() + } + } + } + ``` + +### 4.4 Refactor data plugin search module to call core API to get datasource client + +`Search strategy` is the low level API of data plugin search module. It retrieve clients and query OpenSearch. It needs to be refactored to switch between default client and datasource client, depending on whether a request is send to datasource or not. + +Currently default client is retrieved by search module of data plugin to interact with OpenSearch by this API call. Ref: [opensearch-search-strategy.ts](https://github.com/opensearch-project/opensearch-dashboards/blob/e3b34df1dea59a253884f6da4e49c3e717d362c9/src/plugins/data/server/search/opensearch_search/opensearch_search_strategy.ts#L75) + +```ts +const client: OpenSearchClient = core.opensearch.client.asCurrentUser; +// use API provided by opensearch-js lib to interact with OpenSearch +client.search(params); +``` + +Similarly we’ll have the following for datasource use case. `AsCurrentUser` is something doesn’t make sense for datasource, because it’s always the “current” user credential defined in the “datasource”, that we are using to create the client, or look up the client pool. + +```ts +if (request.dataSource) { + await client: OpenSearchClient = + core.opensearchData.getClient() +} else { +// existing logic to retrieve default client + client: OpenSearchClient = core.opensearch.client.asCurrentUser +} + +// use API provided by opensearch-js lib to interact with OpenSearch +client.ping() +client.search(params) +``` + +### 4.5 Client Management + +When loading a dashboard with visualizations, each visualization sends at least 1 request to server side to retrieve data. With multiple data source feature enabled, multiple requests are being sent to multiple datasources, that requires multiple clients. If we return a new client **per request**, it will soon fill up the memory and sockets with idle clients hanging there. Of course we can close a client anytime. But the connection is supposed to be kept alive for easy reload and periodic pulling data. Therefore, we should come up with better solution to manage clients efficiently. + +#### Client pooling by LRU cache + +- Key: data source endpoint +- Value: OpenSearch client object +- Configurable pool size: `data_source.clientPool.size`, default to 5 +- Use existing js `lru-cache` lib in OpenSearch Dashboards, that enables easy initialization, look up, and dumping outdated client. +- While stopping the service, we can close all the connections by looping the LRU cache and calling `client.close()` for each. +- For data sources with same endpoint, but different user, use connection pooling strategy (child client) provided by opensearch-js. + +```ts +import LRUCache from 'lru-cache'; + +export class OpenSearchClientPool { + private cache?: LRUCache + ... +``` + +## 5. Audit & Logging + +[#1986](https://github.com/opensearch-project/OpenSearch-Dashboards/issues/1986) diff --git a/docs/multi-datasource/high_level_design.md b/docs/multi-datasource/high_level_design.md new file mode 100644 index 00000000000..f89bd3d0ff3 --- /dev/null +++ b/docs/multi-datasource/high_level_design.md @@ -0,0 +1,146 @@ +# Multiple Data Source Support High Level Design + +OpenSearch Dashboards is designed and implemented to only work with one single OpenSearch cluster. This documents discusses the design to enable OpenSearch Dashboards to work with multiple OpenSearch endpoints, which can be a centralized data visualization and analytics application. + +For more context, see RFC [Enable OpenSearch Dashboards to support multiple OpenSearch clusters](https://github.com/opensearch-project/OpenSearch-Dashboards/issues/1388) + +## User Stories + +[OpenSearch Dashboards Multiple OpenSearch Data Source Support User Stories](user_stories.md) + +From a very high level, we introduce `data-source` as a new OpenSearch Dashboards saved object type. + +## Terminologies + +- **Dashboards metadata**: refers to data documents saved in the `.kibana` index. Equivalent to Dashboards **saved objects**. +- **User data**: in this document, user data refers to the log, metrics or search catalog data that saved in OpenSearch, users run analysis against these user data with OpenSearch Dashboards. +- **Data source**: an OpenSearch endpoint, it could be a on-prem cluster, or AWS managed OpenSearch domain or a serverless collection, which stores the user log/metrics data for visualization and analytics purpose. + - in this document, we may also refer data source as a new type of OpenSearch Dashboards saved objects, which is a data model to describe a data source, including endpoint, auth info, capabilities etc. + +## Scope + +We are targeting to release the multiple data source support in OpenSearch 2.4 preview as an experimental feature, and make it GA over a few minor version throughout 2.x versions. + +### Preview Scope + +- data source only support basic authentication with OpenSearch + - API key, JWT, Sigv4 and other auth types are out of scope +- data source will only work with visualizations, and discover + - plugins like AD/Alerting/ISM doesn’t work with data source + - DevTool console maybe in scope depending on the progress and resource + - Observability visualizations are out of scope +- data source support can be enabled/disable based on config in OpenSearch Dashboards yml config file +- multiple data source project doesn’t change existing security experience + - e.g. if a user have access to a security tenant, they will be able to use the data sources defined in that tenant + +### GA Scope + +- Support all Elasticsearch 7.10 DSL/API compatible data sources, including customer self managed Elasticsearch 7.10, OpenSearch 2.x clusters, AWS managed OpenSearch and Elasticsearch 7.10 domains. OpenSearch Serverless collections. + - Support Basic auth, AWS SigV4 signing with Data sources +- OpenSearch Dashboards plugins such as Alerting/AD etc. can work with each data source depending on the data source capability +- Observability visualizations are out of scope +- Support of different (major) versions of ES/OpenSearch data sources is out of scope + +## Requirements + +### Functional requirements + +- OpenSearch Dashboards users should be able to dynamically add/view/update/remove OpenSearch data sources using UI and API +- OpenSearch Dashboards users should be able to save/update/remove credentials( username/password in preview, and AWS Sigv4 in GA) +- OpenSearch Dashboards users can create index pattern with specific data source +- Data source credentials should be handled securely +- OpenSearch Dashboards users can put data visualizations of different data sources into one dashboard +- OpenSearch analytics and management functions (such as AD, ISM and security) can work with specific data source to manage those functions in corresponding data source + - such as user can choose a data source and then edit/view Anomaly detectors and security roles with OpenSearch Dashboards +- OpenSearch Dashboards should be able to work with self managed and AWS managed + +### Limitations + +- One index pattern can only work with one data source +- One visualization will still only work with one index pattern +- Plugins like AD and alerting will only work with one data source at any point of time + +## Design + +### Introducing data source saved object model + +Generally, OpenSearch Dashboards works with 2 kinds of data: + +1. User data, such as application logs, metrics, and search catalog data in data indices. +2. OpenSearch Dashboards metadata, which are the saved objects in `.kibana` index + +Currently both OpenSearch Dashboards metadata and user data indices are saved in the same OpenSearch cluster. However in the case to support OpenSearch Dashboards to work with multiple OpenSearch data sources, OpenSearch Dashboards metadata index will be stored in one OpenSearch cluster, and user data indices will be saved in other OpenSearch clusters. Thus we will need to differentiate OpenSearch Dashboards metadata operations and user data access. + +OpenSearch Dashboards admin will still define an OpenSearch cluster in the `opensearch.host` config in `opensearch_dashboards.yml` file. It will be used as the OpenSearch Dashboards metadata store, and OpenSearch Dashboards metadata will still be saved in the `.kibana` index in this OpenSearch cluster. + +Regarding the user data access, we propose to add a new “data-source” saved objects type, which describes a data source connection, such as + +- cluster endpoint +- auth info, like auth types and credentials to use when accessing the data source +- data source capabilities, such as if the data source supports AD/ISM etc. + +Users can dynamically add data source in OpenSearch Dashboards using UI or API, OpenSearch Dashboards will save the data source saved objects in its metadata index. And then users can do as they want with their data sources. For example, when OpenSearch Dashboards needs to access user data on behalf of the customer, customer will need to specify a data source id, then OpenSearch Dashboards can fetch the data source info from its metadata store, then send the request to the corresponding data source endpoint. + +So the Dashboards and OpenSearch setup may look like:![img](./img/hld_setup_diagram.png) + +Refer to the proposed solution in [#1388](https://github.com/opensearch-project/OpenSearch-Dashboards/issues/1388) for the data modeling of data source + +### Data source integration + +[opensearch_service](https://github.com/opensearch-project/OpenSearch-Dashboards/tree/main/src/core/server/opensearch) is one of the core modules of OpenSearch Dashboards, it is a singleton instance in OpenSearch Dashboards which manages OpenSearch Dashboards connection with the backend OpenSearch endpoint. It makes another level of abstraction of OpenSearch client, and provide a set of interfaces for other OpenSearch Dashboards modules and plugins to interact with OpenSearch for example running DSL queries, or calling arbitrary OpenSearch APIs. + +Currently, OpenSearch Dashboards only works with one OpenSearch cluster, OpenSearch Dashboards metadata index and user data indices are stored in the same OpenSearch cluster. So the OpenSearch Dashboards [saved object service](https://github.com/opensearch-project/OpenSearch-Dashboards/tree/main/src/core/server/saved_objects), which the core OpenSearch Dashboards module handles all OpenSearch Dashboards metadata operations, also relies on `opensearch_service` interfaces to work with OpenSearch. + +With multi-datasource, we will need to diverge the `opensearch_service` for these 2 use cases. We propose to fork a new `metadata_client` from existing `opensearch_service` to manage the metadata store connection, so that `saved_objects_service` can use `metadata_client` to perform saved objects operations. And then we repurpose the `opensearch_service` to serve the user data access use cases. The new `opensearch_service` needs will expose following interface to allow other OpenSearch Dashboards components to interact with a specific data source cluster. + +``` +core.opensearch.withDataSource().callAsCurrentUser(searchParams) +``` + +OpenSearch Dashboards plugins like data plugin, alerting plugin will need to introduce the data source concept into their use case, letting users to specify a data source when using their functions, and then switch to this new opensearch interface when calling OpenSearch APIs or executing queries. + +### Visualization solution with support of multiple datasource + +Current OpenSearch Dashboards visualization solution replies on 3 major saved object types: index-pattern, visualization and dashboard. + +- Index pattern is a level of data abstraction. Index pattern describes a set of data indices, and their data schema. +- Visualization works starts with index pattern. OpenSearch Dashboards users can create data visualizations against an index pattern. A visualization includes the OpenSearch DSL query, aggregation and a reference to an index pattern, as well as graph metadata such as legend and labels. When rendering a visualization graph, the visualization executes the query & aggregation against that specific index pattern, and draw the graph according to graph settings. +- Dashboard references visualizations. OpenSearch Dashboards users can place a set of visualizations into a dashboard. A OpenSearch Dashboards dashboards describes the layout and control (time picker, field filters) of all visualizations on the dashboard. + +To support multiple data source in OpenSearch Dashboards, we will add “data source” saved object as a reference to the index pattern model. One index pattern will have one data source reference. An index pattern can only refer to one data source, one data source can be used by multiple index patterns. + +With this new “data source” reference in index pattern, OpenSearch Dashboards users will need to first create data sources in OpenSearch Dashboards, then select a data source when creating index patterns. Then the visualization and dashboard creation experience will remain the same. Also for any other save object types, if they reference index-pattern, or reference any save object that references index-pattern. Retrieving data from data source will be supported out of the box. + +- For OpenSearch Dashboards multiple data source user experience, refer to [OpenSearch Dashboards Multiple OpenSearch Data Source Support User Stories](https://quip-amazon.com/VXQ0AhpPs3gU) + +- The OpenSearch Dashboards visualization rendering flow will look like following with multi-datasource support: ![image](./img/hld_vis_flow.png) + +### Backward Compatibility + +We plan to release this multi-datasource support as an experimental feature with OpenSearch 2.4. OpenSearch Dashboards admins will be able to enable or disable the multi-datasource feature using configurations in `opensearch_dashboards.yml` . + +If multi-datasource is enabled, OpenSearch Dashboards users will be able to see all data source related feature and APIs, that they can manage their data sources, and build visualization and dashboards with data sources. While if multi-datasource is disabled, users will not see anything related to data sources, and their OpenSearch Dashboards experience will remain the same as single data source. + +If OpenSearch Dashboards admin enables multi-datasource for an existing OpenSearch Dashboards service, users will still able to use their existing index patterns and visualizations, which will by default fetch data from the same endpoint as their metadata store. + +If an OpenSearch Dashboards service has enabled multi-datasource, and it already has index pattern with remote data source created, admin will not able to disable multi-datasource feature. OpenSearch Dashboards will fail to start if it detected data source in the saved object while multi-datasource is disabled. + +### Security + +#### Data source access control + +Multi-datasource project doesn’t plan to change the security (authN & authZ) controls for OpenSearch Dashboards. The `data-source` is a new type of saved objects, so the access control of `data source` will follow the same way as other saved objects such as index patterns and visualizations. + +Based on existing OpenSearch and OpenSearch Dashboards security implementations, OpenSearch Dashboards saved objects access control is implemented via `security tenants`. OpenSearch users are mapped to a set of roles, and each role has corresponding permission to access certain tenants. If a user has permission to access a tenant, they will be able to access all saved objects in that tenant. With this mechanism, if a user created a data source in a shared tenant, other users who has access to that shared tenant will be able to see the data source object and see/create visualizations with the data source. + +#### Data source credential handling + +Credentials is part of the data source object, and will be saved in OpenSearch Dashboards metadata index. OpenSearch Dashboards will use that credentials to authenticate with the data source when executing queries. This credentials will need to be encrypted regardless OpenSearch Dashboards has access control or not. + +We will use a symmetric key to encrypt the credentials before saving data source into OpenSearch Dashboards metadata index, and use the same key to decrypt it when OpenSearch Dashboards needs to authenticate with corresponding data source. For open source release, we will allow admins to configure the encryption key in the `opensearch_dashboards.yml` file. + +For more about credential encryption/decryption strategy, refer to [#1756](https://github.com/opensearch-project/OpenSearch-Dashboards/issues/1756) + +#### Auditing + +As part of the security effort, OpenSearch Dashboards needs to support the logging for all use of data sources, so that admins can have a clear view of which OpenSearch Dashboards user accessed data source, and queried data from that data source. The audit log could be saved in the metadata store, or local logs for potential auditing work. diff --git a/docs/multi-datasource/img/client_management_architecture.png b/docs/multi-datasource/img/client_management_architecture.png new file mode 100644 index 00000000000..548741bfc5c Binary files /dev/null and b/docs/multi-datasource/img/client_management_architecture.png differ diff --git a/docs/multi-datasource/img/client_management_dataflow.png b/docs/multi-datasource/img/client_management_dataflow.png new file mode 100644 index 00000000000..f0937167d69 Binary files /dev/null and b/docs/multi-datasource/img/client_management_dataflow.png differ diff --git a/docs/multi-datasource/img/dsm_flow.png b/docs/multi-datasource/img/dsm_flow.png new file mode 100644 index 00000000000..3dc4d3f72f4 Binary files /dev/null and b/docs/multi-datasource/img/dsm_flow.png differ diff --git a/docs/multi-datasource/img/hld_setup_diagram.png b/docs/multi-datasource/img/hld_setup_diagram.png new file mode 100644 index 00000000000..15854999b39 Binary files /dev/null and b/docs/multi-datasource/img/hld_setup_diagram.png differ diff --git a/docs/multi-datasource/img/hld_vis_flow.png b/docs/multi-datasource/img/hld_vis_flow.png new file mode 100644 index 00000000000..08bf027ffc1 Binary files /dev/null and b/docs/multi-datasource/img/hld_vis_flow.png differ diff --git a/docs/multi-datasource/resources/client_management_architecture.puml b/docs/multi-datasource/resources/client_management_architecture.puml new file mode 100644 index 00000000000..21cd6e9fa76 --- /dev/null +++ b/docs/multi-datasource/resources/client_management_architecture.puml @@ -0,0 +1,117 @@ +@startuml +hide stereotype +skinparam nodesep 6 +skinparam ranksep 10 + +skinparam component { +backgroundColor<> Green +backgroundColor<> Orange +backgroundColor<> LightYellow +backgroundColor<> LightGrey +backgroundColor<> Khaki +backgroundColor<> Grey +backgroundColor<> LightGrey +} +skinparam rectangle { +backgroundColor<> Green +backgroundColor<> Orange +backgroundColor<> LightYellow +backgroundColor<> LightGrey +backgroundColor<> Khaki +backgroundColor<> Grey +backgroundColor<> LightGrey +} +skinparam node { +backgroundColor<> Green +backgroundColor<> Orange +backgroundColor<> LightYellow +backgroundColor<> LightGrey +backgroundColor<> Khaki +backgroundColor<> Grey +backgroundColor<> LightGrey +} + +title ** OSD Multi Data Source Client Management Architecture ** + +node "Dashboards" as cluster { +rectangle "Legend" { + rectangle "New" as new <> + rectangle "Modified" as modify <> + rectangle "Existing" as existing <> + rectangle "External" as external <> + new -[hidden]right- modify + modify -[hidden]right- existing + existing -[hidden]right- external +} + + rectangle "Other Plugins" <> { + rectangle "visualization" as viz <> { + + } + rectangle "Alerting or other" as a <> { + + } + } + rectangle "Data Plugin" <> as dp { + rectangle "Search Module" as sm <> { + rectangle "Search Source" <> as source { + + } + rectangle "Search Strategy" as strategy <> { + + } + + } + interface "DataPluginStart" as dps + } + + rectangle "OpenSearch Data Source Plugin" as ods <> { + component "DataSource Service" as ds + interface "PluginSetUp" as dsps + component "Crypto Service" as cs + } + + rectangle "Core" <> as core { + + rectangle "opensearch service" as os_service <> { + component "internal/scoped client" as ic <> + interface "ServiceStart" as osss + + } + interface "CoreStart" as core_start + rectangle "saved object service" as sos <> { + interface "ServiceStart" as soss + } + + interface "CoreRouteHandlerContext" as cc <> + + + } + + ds --> es: query + source -> strategy: call + strategy --> cc: get datasource client + viz --> dps: speical viz types + viz --> source + dps --> sm + sos --> os: get saved objects + core_start --> cc + core_start <.. a: get client + a ..> dps + a ...> cc: get client + core_start <-- osss: register + osss <-- soss: depends + ic --> os: query + ds -> cs: decrypt credential + dsps ---> cc: register + dp --[hidden]-- ods + + rectangle "Default OpenSearch" <> as os { + } + rectangle "Datasource(OpenSearch)" <> as es { + + } +} + + +@enduml \ No newline at end of file diff --git a/docs/multi-datasource/resources/client_management_dataflow.puml b/docs/multi-datasource/resources/client_management_dataflow.puml new file mode 100644 index 00000000000..fa778f8b0ce --- /dev/null +++ b/docs/multi-datasource/resources/client_management_dataflow.puml @@ -0,0 +1,75 @@ +@startuml +autoactivate on + +title ** Multiple Datasource Visualization call sequence ** + +box "OSD Browser" +participant "visualization" as viz +' participant "Timeline/Vega/TSVB" as viz_s +participant "expression" as e +end box + + +box "OSD Server-Data Plugin" #LightBlue +participant "SearchSource\n(High Level API)" as s +participant "Search Strategy\n(Low Level API)" as ss +end box + +box "OSD Server-Data Source Plugin" #LightBlue +participant "OpenSearch Data Service" as ods #LightGreen +end box + +box "OSD Server-Core" #LightBlue +participant "OpenSearch Service" as os +participant "Saved Object Service" as sos +end box + +box "OpenSearch" +database "OSD metadata" as oi +database "data index" as default_di +end box + +box "DataSource[OS]" +database "data index" as datasource_di +end box + +sos --> os: depends on +ods --> sos: depends on + +viz -> e: execute expression pipeline +e -> s: create SearchSource +s -> ss: call .search() +alt viz_type = Timelion/TSVB/Vega +viz -> ss: call .search() +end + +ss -> os: get client +alt if (datasource == true) +ss -> ods: get datasource client +alt if exists in datasource client pool +ods -> ods: find client +end + +ods -> sos: call saved_obj_client +sos -> oi: get datasource metadata +oi --> sos: +sos --> ods: datasource metadata +ods --> ods: create datasource client \n and add to pool +ods --> ss: return client + +end +os --> ss: return client +alt if (client is datasource Client) +ss -> datasource_di: query +datasource_di --> ss: data +end + +ss -> default_di: query +default_di --> ss: data + +ss --> s: data +s --> e: data +e --> viz: render data + +skinparam BoxPadding 15 +@enduml \ No newline at end of file diff --git a/docs/multi-datasource/resources/dsm_flow.puml b/docs/multi-datasource/resources/dsm_flow.puml new file mode 100644 index 00000000000..a4da756cc7a --- /dev/null +++ b/docs/multi-datasource/resources/dsm_flow.puml @@ -0,0 +1,45 @@ +@startuml +title DataSource Management in Stack Management + + +:DataSource Owner: as DSO +:DataSource User: as DSU + +(Stack Management Page) as (Page-Stack) +(DataSource Management Page) as (Page-DM) +(Add new DataSource Page) as (Page-AND) +(DataSource Grid View) as (Component-DSG) +(DataSource Edit Page) as (Page-DSEP) +(Delete DataSource Button) as (Component-DelDS) +(Export DataSource Button) as (Component-ExDS) +(Import DataSource Button) as (Component-ImDS) + + +(DataSource Name) as (Component-DSName) +(DataSource Type) as (Component-DSType) +(DataSource Endpoint) as (Component-DSEndpoint) +(DataSource Credential) as (Component-DSCredential) + +DSO -> (Page-Stack) +(Page-Stack) -> (Page-DM) +(Page-DM) -> (Page-AND) : Add +(Page-AND) -> (Page-DM) : Save +(Page-DM) ...> (Component-DSG): View +(Page-DM) <..> (Component-DelDS) : Delete +(Page-DM) <..> (Component-ExDS) : Export +(Page-DM) <..> (Component-ImDS) : Import + +(Component-DSG) -> (Page-DSEP): Edit + +(Page-DSEP) .....> (Component-DSName) : Edit +(Page-DSEP) .....> (Component-DSType): Edit +(Page-DSEP) .....> (Component-DSEndpoint): Edit +(Page-DSEP) .....> (Component-DSCredential): Select + +note top of DSO + DataSource Owner who has access to manage + all DataSources. + When security enabled, user could only see + DataSources added by them +end note +@enduml \ No newline at end of file diff --git a/docs/multi-datasource/user_stories.md b/docs/multi-datasource/user_stories.md new file mode 100644 index 00000000000..2c4dd97721a --- /dev/null +++ b/docs/multi-datasource/user_stories.md @@ -0,0 +1,65 @@ +# OpenSearch Dashboards Multiple OpenSearch Data Source Support User Stories + +Today, OpenSearch Dashboards (OpenSearch Dashboards) can only connect to one single OpenSearch cluster by configuring the cluster endpoint in the `opensearch_dashboards.yml` config file. We want to allow OpenSearch Dashboards users to dynamically add/update/remove OpenSearch compatible endpoints, and then do their analytics work with data in those OpenSearch data stores. + +RFC: https://github.com/opensearch-project/OpenSearch-Dashboards/issues/1388 + +This document discusses the user experience of the OpenSearch Dashboards multiple data source support. + +## User Story + +### Current user experience + +- OpenSearch Dashboards admin setup the OpenSearch Dashboards service and configure the OpenSearch endpoint in `opensearch_dashboards.yml` + - Both the OpenSearch Dashboards metadata index (`opensearch_dashboards` index) and data indices are saved in the same OpenSearch cluster +- OpenSearch Dashboards users can work with visualizations, usually they will + - Create/update index patterns + - Create/update visualization, each visualization is built on top of one index pattern + - Create/update dashboard using a group of visualizations + - Run adhoc queries against an index pattern using discover feature + - View index patterns/visualization/dashboards +- OpenSearch Dashboards users can work with analytics functions, such as Alerting/AD etc + +### Expected user experience with multiple data source + +We are planning to introduce a new `data-source` model, to describe an OpenSearch data source, and letting index pattern to refer to a `data-source`. + +- OpenSearch Dashboards admin setup the OpenSearch Dashboards service and configure the OpenSearch **metadata store endpoint** in `opensearch_dashboards.yml` + - the metadata store OpenSearch cluster only saves the `.kibana` index, data indices can be saved in other OpenSearch stores +- Users will need to have a data-source before they can do any visualization or analytics work with OpenSearch Dashboards + - Users can create/update/view data sources + - Users need to specify a data source when creating new index patterns, data source is not mutable after index pattern is created + - Create/update visualization and dashboards experience remains the same as is today. + - View index patterns/visualization/dashboards experience remains the same as is today. +- When users want to work with analytics features like AD and alerting. they need to specify a data source to work with. (We may consider to add default data source concept) + +## UI Change + +This multiple data source support and introduction of data source model requires several UI changes on OpenSearch Dashboards + +### Data source management + +![img](./img/dsm_flow.png) + +Data source, as a new saved object type, should have a management page, like index pattern. + +We will need to + +- add a new data source entry in the stack management Nav app, with a data source list table +- a data source detail page, to show detailed information of a specific data source, such as URL, auth type, endpoint capabilities etc. + +### Index Pattern + +- Index pattern creation flow: With the data sources, users will need to specify which data source to use when creating a new index pattern. +- Index pattern detail page: On the index pattern detail page, we will need to show which data source this index pattern uses +- Data source selector for plugins: when OpenSearch Dashboards users working with analytics functions like Alerting and AD, we will want to allow users to switch between data sources + +## Appendix + +### Data source security + +For the initial launch with OpenSearch 2.4 preview, we do not plan to change security design of OpenSearch. + +When creating a data source, users need to provide endpoint URL, username and password(if using basic authentication). OpenSearch Dashboards service will encrypt the username and password when storing it into metadata store. + +Data source is a new type of OpenSearch Dashboards saved objects. In current OpenSearch security model, access control on data source document is the same as other saved objects documents. Basically data source docs will be accessible by any user who has access to the tenant. diff --git a/src/plugins/data_source/README.md b/src/plugins/data_source/README.md index fdf6baab783..a76e9a1fb5a 100755 --- a/src/plugins/data_source/README.md +++ b/src/plugins/data_source/README.md @@ -16,7 +16,7 @@ Update the following configuration in the `opensearch_dashboards.yml` file to ap - Current auditor configuration: -``` +```yml data_source.audit.appender.kind: 'file' data_source.audit.appender.layout.kind: 'pattern' data_source.audit.appender.path: '/tmp/opensearch-dashboards-data-source-audit.log' @@ -24,10 +24,11 @@ data_source.audit.appender.path: '/tmp/opensearch-dashboards-data-source-audit.l 3. The default encryption-related configuration parameters are: -``` +```yml data_source.encryption.wrappingKeyName: 'changeme' data_source.encryption.wrappingKeyNamespace: 'changeme' -data_source.encryption.wrappingKey: [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] +data_source.encryption.wrappingKey: + [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] ``` Note that if any of the encryption keyring configuration values change (wrappingKeyName/wrappingKeyNamespace/wrappingKey), none of the previously-encrypted credentials can be decrypted; therefore, credentials of previously created data sources must be updated to continue use. @@ -79,7 +80,7 @@ a. Envelope encryption - provides strong protection on data keys. Read more deta b. Key derivation with HMAC - KDF with SHA-384 protects against accidental reuse of a data encryption keys and reduces the risk of overusing data keys. -c. Signature algorithm - ECDSA with P-384 and SHA-384. Under multiple data source case, data source documents stored on OpenSearch can be modified / replaced by attacker. With ECDSA signature, ciphertext decryption will fail if it’s getting pullted. No one will be able to create another signature that verifies with the public key because the private key has been dropped. +c. Signature algorithm - ECDSA with P-384 and SHA-384. Under multiple data source case, data source documents stored on OpenSearch can be modified / replaced by attacker. With ECDSA signature, ciphertext decryption will fail if it’s getting polluted. No one will be able to create another signature that verifies with the public key because the private key has been dropped. Please check https://github.com/opensearch-project/OpenSearch-Dashboards/issues/1756 for more details. @@ -88,5 +89,14 @@ Please check https://github.com/opensearch-project/OpenSearch-Dashboards/issues/ ## Development See the [OpenSearch Dashboards contributing -guide](https://github.com/opensearch-project/OpenSearch-Dashboards/blob/main/CONTRIBUTING.md) for instructions -setting up your development environment. +guide](https://github.com/opensearch-project/OpenSearch-Dashboards/blob/main/CONTRIBUTING.md) for instructions setting up your development environment. + +### Design Documents + +- [High level design doc](../../../docs/multi-datasource/high_level_design.md) +- [User stories](../../../docs/multi-datasource/user_stories.md) +- [Client management detailed design](../../../docs/multi-datasource/client_management_design.md) + +### Integrate with multiple data source feature + +TODO: [#2455](https://github.com/opensearch-project/OpenSearch-Dashboards/issues/2455)