From 19784571c49cfe6a4cccba69aedf0d183dc00d88 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Mon, 16 Sep 2024 22:51:26 +0000 Subject: [PATCH] Update headings and punctuation in sycamore page (#8301) * Update sycamore.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _tools/sycamore.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> (cherry picked from commit 242cc6b1e743f742cf1774ef74b545788ec9c318) Signed-off-by: github-actions[bot] --- _tools/sycamore.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/_tools/sycamore.md b/_tools/sycamore.md index 7ce55931ac..9b3986dbf3 100644 --- a/_tools/sycamore.md +++ b/_tools/sycamore.md @@ -11,7 +11,7 @@ has_children: false To get started, visit the [Sycamore documentation](https://sycamore.readthedocs.io/en/stable/sycamore/get_started.html). -# Sycamore ETL pipeline structure +## Sycamore ETL pipeline structure A Sycamore extract, transform, load (ETL) pipeline applies a series of transformations to a [DocSet](https://sycamore.readthedocs.io/en/stable/sycamore/get_started/concepts.html#docsets), which is a collection of documents and their constituent elements (for example, tables, blocks of text, or headers). At the end of the pipeline, the DocSet is loaded into OpenSearch vector and keyword indexes. @@ -19,7 +19,7 @@ A typical pipeline for preparing unstructured data for vector or hybrid search i * Read documents into a [DocSet](https://sycamore.readthedocs.io/en/stable/sycamore/get_started/concepts.html#docsets). * [Partition documents](https://sycamore.readthedocs.io/en/stable/sycamore/transforms/partition.html) into structured JSON elements. -* Extract metadata, filter, and clean data using [transforms](https://sycamore.readthedocs.io/en/stable/sycamore/APIs/docset.html). +* Extract metadata and filter and clean data using [transforms](https://sycamore.readthedocs.io/en/stable/sycamore/APIs/docset.html). * Create [chunks](https://sycamore.readthedocs.io/en/stable/sycamore/transforms/merge.html) from groups of elements. * Embed the chunks using the model of your choice. * [Load](https://sycamore.readthedocs.io/en/stable/sycamore/connectors/opensearch.html) the embeddings, metadata, and text into OpenSearch vector and keyword indexes. @@ -27,7 +27,7 @@ A typical pipeline for preparing unstructured data for vector or hybrid search i For an example pipeline that uses this workflow, see [this notebook](https://github.com/aryn-ai/sycamore/blob/main/notebooks/opensearch_docs_etl.ipynb). -# Install Sycamore +## Install Sycamore We recommend installing the Sycamore library using `pip`. The connector for OpenSearch can be specified and installed using extras. For example: @@ -45,4 +45,4 @@ pip install sycamore-ai[opensearch,local-inference] ## Next steps -For more information, visit the [Sycamore documentation](https://sycamore.readthedocs.io/en/stable/sycamore/get_started.html). \ No newline at end of file +For more information, visit the [Sycamore documentation](https://sycamore.readthedocs.io/en/stable/sycamore/get_started.html).