Skip to content

Commit

Permalink
[doc][cdc] Updated diagrams (#23262)
Browse files Browse the repository at this point in the history
* updated diagrams

* Apply suggestions from code review

Co-authored-by: Aishwarya Chakravarthy  <achakravarthy@yugabyte.com>

---------

Co-authored-by: Aishwarya Chakravarthy <achakravarthy@yugabyte.com>
  • Loading branch information
ddhodge and aishwarya24 authored Jul 24, 2024
1 parent 89e434e commit 1b3585f
Show file tree
Hide file tree
Showing 4 changed files with 7 additions and 7 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,14 @@ type: docs

## Architecture

![Stateless CDC Service](/images/architecture/stateless_cdc_service.png)

Every YB-TServer has a `CDC service` that is stateless. The main APIs provided by the CDC service are the following:

- `createCDCSDKStream` API for creating the stream on the database.
- `getChangesCDCSDK` API that can be used by the client to get the latest set of changes.

## CDC streams
![Stateless CDC Service](/images/architecture/stateless_cdc_service.png)

Creating a new CDC stream returns a stream UUID. This is facilitated via the [yb-admin](../../../admin/yb-admin/#change-data-capture-cdc-commands) tool.
## CDC streams

YugabyteDB automatically splits user tables into multiple shards (also called tablets) using either a hash- or range-based strategy. The primary key for each row in the table uniquely identifies the location of the tablet in the row.

Expand All @@ -39,11 +37,13 @@ The Debezium YugabyteDB connector captures row-level changes in the schemas of a

![How does CDC work](/images/explore/cdc-overview-work.png)

The connector produces a change event for every row-level insert, update, and delete operation that was captured, and sends change event records for each table in a separate Kafka topic. Client applications read the Kafka topics that correspond to the database tables of interest, and can react to every row-level event they receive from those topics. For each table, the default behavior is that the connector streams all generated events to a separate Kafka topic for that table. Applications and services consume data change event records from that topic.
The core primitive of CDC is the _stream_. Streams can be enabled and disabled on databases. You can specify which tables to include or exclude. Every change to a watched database table is emitted as a record in a configurable format to a configurable sink. Streams scale to any YugabyteDB cluster independent of its size and are designed to impact production traffic as little as possible.

Creating a new CDC stream returns a stream UUID. This is facilitated via the [yb-admin](../../../admin/yb-admin/#change-data-capture-cdc-commands) tool. A stream ID is created first, per database. You configure the maximum batch side in YugabyteDB, while the polling frequency is configured on the connector side.

The core primitive of CDC is the _stream_. Streams can be enabled and disabled on databases. Every change to a watched database table is emitted as a record in a configurable format to a configurable sink. Streams scale to any YugabyteDB cluster independent of its size and are designed to impact production traffic as little as possible.
Connector tasks can consume changes from multiple tablets. At least once delivery is guaranteed. In turn, connector tasks write to the Kafka cluster, and tasks don't need to match Kafka partitions. Tasks can be independently scaled up or down.

![How does CDC work](/images/explore/cdc-overview-work3.png)
The connector produces a change event for every row-level insert, update, and delete operation that was captured, and sends change event records for each table in a separate Kafka topic. Client applications read the Kafka topics that correspond to the database tables of interest, and can react to every row-level event they receive from those topics. For each table, the default behavior is that the connector streams all generated events to a separate Kafka topic for that table. Applications and services consume data change event records from that topic. All changes for a row (or rows in the same tablet) are received in the order in which they happened. A checkpoint per stream ID and tablet is updated in a state table after a successful write to Kafka brokers.

## CDC guarantees

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/static/images/explore/cdc-overview-work.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/static/images/explore/cdc-overview-work2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 1b3585f

Please sign in to comment.