Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compatible: Add sync_docs.yaml #220

Merged
merged 68 commits into from
Aug 16, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
fd58087
First commit
a-velasco Aug 5, 2024
8821fbc
Misc PR review fixes
a-velasco Aug 7, 2024
71f818c
Removing run schedule since there are no longer API constraints
a-velasco Aug 7, 2024
330982e
Using GITHUB_TOKEN instead of PA TOKEN
a-velasco Aug 7, 2024
a2e0463
Update .github/workflows/_sync_docs_v2.yaml
a-velasco Aug 8, 2024
6b8e227
Update .github/workflows/_sync_docs_v2.yaml
a-velasco Aug 8, 2024
0561751
Update .github/workflows/_sync_docs_v2.yaml
a-velasco Aug 8, 2024
ac42f67
Update .github/workflows/_sync_docs_v2.md
a-velasco Aug 8, 2024
ee2866b
Update python/cli/data_platform_workflows_cli/sync_docs_v2.py
a-velasco Aug 8, 2024
c92536f
Update python/cli/data_platform_workflows_cli/sync_docs_v2.py
a-velasco Aug 8, 2024
3c43657
Update .github/workflows/_sync_docs_v2.yaml
a-velasco Aug 8, 2024
64f0a61
Small edits
a-velasco Aug 8, 2024
17e4327
Replaced original sync_docs with v2 and removed experimental file pre…
a-velasco Aug 8, 2024
39cd6c0
Merge branch 'download-discourse-topics' of github.com:canonical/data…
a-velasco Aug 8, 2024
bd3c59c
Update sync_docs.md
a-velasco Aug 8, 2024
0d196b5
Update README.md
a-velasco Aug 8, 2024
908c137
Update sync_docs.yaml
a-velasco Aug 8, 2024
67f50a4
Update .github/workflows/sync_docs.md
a-velasco Aug 9, 2024
ee6932f
Update python/cli/data_platform_workflows_cli/sync_docs.py
a-velasco Aug 9, 2024
5731b97
Update .github/workflows/sync_docs.yaml
a-velasco Aug 9, 2024
9f46bd4
Update .github/workflows/sync_docs.yaml
a-velasco Aug 9, 2024
d645e02
Update README.md
a-velasco Aug 9, 2024
ca1164a
Update python/cli/data_platform_workflows_cli/sync_docs.py
a-velasco Aug 9, 2024
6460585
Formatted with isort
a-velasco Aug 9, 2024
d05d5ff
Update python/cli/data_platform_workflows_cli/sync_docs.py
a-velasco Aug 9, 2024
5316d89
Update sync_docs.md
a-velasco Aug 9, 2024
aa3c9ed
Update sync_docs.md
a-velasco Aug 9, 2024
7a8e201
Update sync_docs.md
a-velasco Aug 9, 2024
eafdffd
Update sync_docs.py
a-velasco Aug 9, 2024
917485d
Reformatted with black
a-velasco Aug 9, 2024
785629c
Update python/cli/data_platform_workflows_cli/sync_docs.py
a-velasco Aug 9, 2024
d9d979a
Update sync_docs.md
a-velasco Aug 9, 2024
e051a00
Update sync_docs.yaml
a-velasco Aug 12, 2024
69c58fe
Update sync_docs.yaml
a-velasco Aug 12, 2024
e3cd611
test
a-velasco Aug 12, 2024
3225cd9
Update sync_docs.yaml
a-velasco Aug 12, 2024
c369e5c
Update sync_docs.yaml
a-velasco Aug 12, 2024
c7e6000
Add logging
carlcsaposs-canonical Aug 12, 2024
46f8dbc
Fixed bug where first line of navtable was getting automatically filt…
a-velasco Aug 12, 2024
4e33b34
Update .github/workflows/sync_docs.yaml
a-velasco Aug 12, 2024
6cf1cdd
temp (debug)
a-velasco Aug 12, 2024
7bdf969
temp (debug)
a-velasco Aug 12, 2024
2211711
temp (debug)
a-velasco Aug 12, 2024
ac7d9dc
Update sync_docs.yaml
a-velasco Aug 12, 2024
fd0a2ed
Update sync_docs.yaml
a-velasco Aug 12, 2024
ba261b1
Added overview topic download and support for valid non-diataxis topics
a-velasco Aug 12, 2024
3a81e0c
Update .github/workflows/sync_docs.yaml
a-velasco Aug 12, 2024
f7c2c03
Update python/cli/data_platform_workflows_cli/sync_docs.py
a-velasco Aug 12, 2024
2adc0db
Moved download logic into Topic class
a-velasco Aug 12, 2024
670c897
Merge branch 'download-discourse-topics' of github.com:canonical/data…
a-velasco Aug 12, 2024
663ffc0
Update sync_docs.md
a-velasco Aug 12, 2024
0bde819
Update python/cli/data_platform_workflows_cli/sync_docs.py
a-velasco Aug 12, 2024
9b77eac
Update python/cli/data_platform_workflows_cli/sync_docs.py
a-velasco Aug 12, 2024
53944e4
Rename topic download function
a-velasco Aug 12, 2024
80b8347
Some rephrasing
a-velasco Aug 12, 2024
9c0d0f2
Update python/cli/data_platform_workflows_cli/sync_docs.py
a-velasco Aug 12, 2024
c6d1a5f
Update python/cli/data_platform_workflows_cli/sync_docs.py
a-velasco Aug 12, 2024
fdeaf62
Update sync_docs.md
a-velasco Aug 12, 2024
71981b5
Formatting
a-velasco Aug 12, 2024
982fce9
Update .github/workflows/sync_docs.md
a-velasco Aug 13, 2024
f121d08
Update .github/workflows/sync_docs.md
a-velasco Aug 13, 2024
d1b71fb
Update sync_docs.md
a-velasco Aug 13, 2024
46b5238
Update .github/workflows/sync_docs.md
a-velasco Aug 13, 2024
d33bea5
Update .github/workflows/sync_docs.md
a-velasco Aug 13, 2024
dccdd38
Update .github/workflows/sync_docs.md
a-velasco Aug 13, 2024
d048543
Update sync_docs.md
a-velasco Aug 13, 2024
e885889
Update sync_docs.md
a-velasco Aug 13, 2024
5a80297
Fixed bad indentation in yaml template
a-velasco Aug 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions .github/workflows/_sync_docs_v2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
Workflow file: [_sync_docs_v2.yaml](_sync_docs_v2.yaml)

> [!WARNING]
> Subject to **breaking changes on patch release**. `_sync_docs_v2.yaml` is experimental & not part of the public interface.

## Usage
Add `.yaml` file to `.github/workflows/`

```yaml
# Copyright 2024 Canonical Ltd.
# See LICENSE file for licensing details.
name: Sync Discourse docs (v2)

on:
workflow_dispatch:
schedule:
- cron: # Refer to Run schedule below

jobs:
sync-docs-v2:
name: Sync docs from Discourse (v2)
uses: canonical/data-platform-workflows/.github/workflows/_sync_docs_2.yaml@main
permissions:
contents: write # Needed to push branch & tag
pull-requests: write # Needed to create PR
```

## Run schedule
a-velasco marked this conversation as resolved.
Show resolved Hide resolved

Cron job schedules for Data Platform repositories

### SQL
| repository | run time | cron |
|:-------------------------:|:--------:|:-------------:|
| mysql-k8s-operator | 12:00 AM | `00 00 * * *` |
| mysql-operator | 12:10 AM | `10 00 * * *` |
| mysql-test-app | 12:20 AM | `20 00 * * *` |
| mysql-router-k8s-operator | 12:30 AM | `30 00 * * *` |
| mysql-router-operator | 12:40 AM | `40 00 * * *` |
| postgresql-k8s-operator | 12:50 AM | `50 00 * * *` |
| postgresql-operator | 01:00 AM | `00 01 * * *` |
| postgresql-test-app | 01:10 AM | `10 01 * * *` |
| pgbouncer-k8s-operator | 01:20 AM | `20 01 * * *` |
| pgbouncer-operator | 01:30 AM | `30 01 * * *` |

### NoSQL
| repository | run time | cron |
|:------------------------------:|:--------:|:-------------:|
| mongodb-k8s-operator | 01:40 AM | `40 01 * * *` |
| mongodb-operator | 01:50 AM | `50 01 * * *` |
| mongos-operator | 02:00 AM | `00 02 * * *` |
| opensearch-k8s-operator | 02:10 AM | `10 02 * * *` |
| opensearch-operator | 02:20 AM | `20 02 * * *` |
| opensearch-dashboards-operator | 02:30 AM | `30 02 * * *` |
| redis-k8s-operator | 02:40 AM | `40 02 * * *` |
| redis-operator | 02:50 AM | `50 02 * * *` |

### Big Data
| repository | run time | cron |
|:---------------------------------:|:--------:|:-------------:|
| kafka-k8s-operator | 03:00 AM | `00 03 * * *` |
| kafka-operator | 03:10 AM | `10 03 * * *` |
| kafka-test-app | 03:20 AM | `20 03 * * *` |
| zookeeper-k8s-operator | 03:30 AM | `30 03 * * *` |
| zookeeper-operator | 03:40 AM | `40 03 * * *` |
| spark-history-server-k8s-operator | 03:50 AM | `50 03 * * *` |
| spark-client-snap | 04:00 AM | `00 04 * * *` |

### Other
| repository | run time | cron |
|:------------------:|:--------:|:-------------:|
| data-integrator | 04:10 AM | `10 04 * * *` |
| s3-integrator | 04:20 AM | `20 04 * * *` |
| data-platform-libs | 04:30 AM | `30 04 * * *` |
51 changes: 51 additions & 0 deletions .github/workflows/_sync_docs_v2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
on:
workflow_call:
inputs:
reviewers:
description: Comma separated list of GitHub usernames to request to review pull request (e.g. "canonical/data-platform-engineers,octocat")
required: false
type: string
secrets:

Check failure on line 8 in .github/workflows/_sync_docs_v2.yaml

View workflow job for this annotation

GitHub Actions / Lint workflows

"type" is missing at "secrets" input of workflow_call event
token:

Check failure on line 9 in .github/workflows/_sync_docs_v2.yaml

View workflow job for this annotation

GitHub Actions / Lint workflows

unexpected key "token" for "inputs at workflow_call event" section. expected one of "default", "description", "required", "type"
description: |
GitHub App token or personal access token (not GITHUB_TOKEN)
a-velasco marked this conversation as resolved.
Show resolved Hide resolved

Permissions needed for App token:
- Access: Read & write for Repository permissions: Pull requests
- Access: Read & write for Repository permissions: Contents
- If GitHub team is requested for pull request review,
Access: Read-only for Organization permissions: Members

Permissions needed for personal access token: write access to repository, read:org
Personal access tokens with fine grained access are not supported (by GraphQL API, which is used by GitHub CLI).

The GITHUB_TOKEN can create a pull request or push a branch, but `on: pull_request` workflows will not be triggered.

Source: https://github.com/peter-evans/create-pull-request/blob/main/docs/concepts-guidelines.md#triggering-further-workflow-runs
required: true

jobs:
sync-docs-v2:

Check failure on line 28 in .github/workflows/_sync_docs_v2.yaml

View workflow job for this annotation

GitHub Actions / Lint workflows

"steps" section is missing in job "sync-docs-v2"
name: Sync Discourse docs (v2)
runs-on: ubuntu-latest
timeout-minutes: 5
steps:

Check failure on line 32 in .github/workflows/_sync_docs_v2.yaml

View workflow job for this annotation

GitHub Actions / Lint workflows

"steps" section is missing in job "steps"

Check failure on line 32 in .github/workflows/_sync_docs_v2.yaml

View workflow job for this annotation

GitHub Actions / Lint workflows

"runs-on" section is missing in job "steps"
a-velasco marked this conversation as resolved.
Show resolved Hide resolved
- name: Get workflow version

Check failure on line 33 in .github/workflows/_sync_docs_v2.yaml

View workflow job for this annotation

GitHub Actions / Lint workflows

"steps" job is sequence node but mapping node is expected
a-velasco marked this conversation as resolved.
Show resolved Hide resolved
id: workflow-version
uses: canonical/get-workflow-version-action@v1
with:
repository-name: canonical/data-platform-workflows
file-name: update_bundle.yaml
a-velasco marked this conversation as resolved.
Show resolved Hide resolved
github-token: ${{ secrets.GITHUB_TOKEN }}
- name: Install CLI
run: pipx install git+https://github.com/canonical/data-platform-workflows@'${{ steps.workflow-version.outputs.sha }}'#subdirectory=python/cli
- name: Checkout
uses: actions/checkout@v4
with:
token: ${{ secrets.token }}
- name: Download Discourse docs
id: sync-docs-v2
run:
- name: Push `sync-docs-v2` branch

- name: Create pull request
a-velasco marked this conversation as resolved.
Show resolved Hide resolved
137 changes: 137 additions & 0 deletions python/cli/data_platform_workflows_cli/download_discourse_topics.py
a-velasco marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
import csv
import dataclasses
import pathlib
import re

import requests
import yaml

NAVTABLE_START_MARKER = "[details=Navigation]"
NAVTABLE_END_MARKER = "[/details]"

def get_topic(topic_id_: str):
"""Get markdown content of a discourse.charmhub.io topic"""

response = requests.get(
f"https://discourse.charmhub.io/raw/{topic_id_}/1"
) # "/1" for post 1

response.raise_for_status()
return response.text


class NoTopicToDownload(Exception):
"""No Discourse topic is available to download

Happens if:
- no "Navlink" is provided (e.g. for a navigation group)
- "Navlink" is an external URL
"""

class NoTableToParse(Exception):
"""No markdown navigation table is available

Happens if:
- Navtable markers do not exist in the topic (i.e. [details=Navigation][/details])
- The navtable markers in the topic have a typo
- The table is empty
"""
a-velasco marked this conversation as resolved.
Show resolved Hide resolved

@dataclasses.dataclass
class Topic:
"""Discourse topic to download"""

id: str
path: pathlib.Path

@classmethod
def from_csv_row(cls, row_: dict):
# Example `row_`: {'Level': '2', 'Path': 't-overview', 'Navlink': '[Overview](/t/9707)'}

# Extract Discourse topic ID from "Navlink"
# Example `link`: "/t/9707"
link = re.fullmatch(r"\[.*?]\((.*?)\)", row_["Navlink"]).group(1)
a-velasco marked this conversation as resolved.
Show resolved Hide resolved
if link == "":
raise NoTopicToDownload
elif link.startswith("http") and "charmhub.io" not in link:
a-velasco marked this conversation as resolved.
Show resolved Hide resolved
# Ignore external links (e.g. "https://canonical.com/data/docs/postgresql/iaas")
raise NoTopicToDownload

match = re.fullmatch(r"/t/([0-9]+)", link)
if not match:
raise ValueError(
f'Invalid navlink "{link}". Expected something like "/t/9707"'
)
# Example `topic_id`: "9707"
topic_id = match.group(1)

# Determine local path to download Markdown file
# Example `topic_slug`: "t-overview"
topic_slug = row_["Path"]
diataxis_directory = {
"t-": "tutorial",
"h-": "how-to",
"r-": "reference",
"e-": "explanation",
}[topic_slug[:2]]

# Example `path`: "docs/tutorial/t-overview.md"
path = pathlib.Path("docs/") / diataxis_directory / f"{topic_slug}.md"

return cls(topic_id, path)

def main():
a-velasco marked this conversation as resolved.
Show resolved Hide resolved
# Example `overview_topic_link`: "https://discourse.charmhub.io/t/charmed-postgresql-documentation/9710"
overview_topic_link: str = yaml.safe_load(pathlib.Path("disc2github/metadata.yaml").read_text())["docs"]
a-velasco marked this conversation as resolved.
Show resolved Hide resolved
assert overview_topic_link.startswith("https://discourse.charmhub.io/")

# Example `topic_id`: "9710"
topic_id = overview_topic_link.split("/")[-1]
overview_topic_markdown = get_topic(topic_id)

# Example:
# | Level | Path | Navlink |
# |--------|--------|-------------|
# | 1 | tutorial | [Tutorial]() |
# | 2 | t-overview | [Overview](/t/9707) |
# | 2 | t-set-up | [1. Set up the environment](/t/9709) |
# | 2 | t-deploy | [2. Deploy PostgreSQL](/t/9697) |
# | 1 | search | [Search](https://canonical.com/data/docs/postgresql/iaas) |

# Search for table delimiters NAVTABLE_START_MARKER and NAVTABLE_END_MARKER
start_index = overview_topic_markdown.find(NAVTABLE_START_MARKER)
if start_index == -1:
raise NoTableToParse("Could not find Navtable start marker " + NAVTABLE_START_MARKER + " in the overview topic")

end_index = overview_topic_markdown.find(NAVTABLE_END_MARKER)
if end_index == -1:
raise NoTableToParse("Could not find Navtable end marker " + NAVTABLE_END_MARKER + " in the overview topic")

start_index += len(NAVTABLE_START_MARKER)
end_index = overview_topic_markdown.find(NAVTABLE_END_MARKER, start_index)

table_raw = overview_topic_markdown[start_index:end_index].strip() # remove leading and trailing whitespace
if table_raw == "":
raise NoTableToParse
a-velasco marked this conversation as resolved.
Show resolved Hide resolved

# Convert Markdown table to list[dict[str, str]]
# (https://stackoverflow.com/a/78254495)
rows: list[dict] = list(csv.DictReader(table_raw.split("\n"), delimiter="|"))

a-velasco marked this conversation as resolved.
Show resolved Hide resolved
# Remove first row (e.g. "|--------|--------|-------------|")
rows = rows[2:]
rows: list[dict[str, str]] = [
{key.strip(): value.strip() for key, value in row.items() if key != ''}
for row in rows
]

a-velasco marked this conversation as resolved.
Show resolved Hide resolved
# Example `row`: {'Level': '2', 'Path': 't-overview', 'Navlink': '[Overview](/t/9707)'}
for row in rows:
a-velasco marked this conversation as resolved.
Show resolved Hide resolved
try:
topic = Topic.from_csv_row(row)
except NoTopicToDownload:
continue

# Download topic markdown to `topic.path`
topic.path.parent.mkdir(parents=True, exist_ok=True)
topic.path.write_text(get_topic(topic_id))
Loading