Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIG] Add Evaluations section with Overview and How to set up Evaluations p… #17061

Closed
wants to merge 20 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
929c351
Add Evaluations section with Overview and How to set up Evaluations p…
daisyfaithauma Sep 24, 2024
220bd5d
Update set-up-evaluations.mdx
kathayl Sep 24, 2024
ab3b1ac
Update index.mdx
kathayl Sep 24, 2024
4df655c
Update set-up-evaluations.mdx
kathayl Sep 24, 2024
281bb24
Update set-up-evaluations.mdx
kathayl Sep 24, 2024
03be66a
Update set-up-evaluations.mdx
kathayl Sep 24, 2024
b368c07
fixed hyperlink
daisyfaithauma Sep 25, 2024
f3c841c
Merge branch 'aig-evaluations' of github.com:cloudflare/cloudflare-do…
daisyfaithauma Sep 25, 2024
36ff819
Update src/content/docs/ai-gateway/observability/evaluations/index.mdx
daisyfaithauma Sep 25, 2024
8b422a5
Update src/content/docs/ai-gateway/observability/evaluations/index.mdx
daisyfaithauma Sep 25, 2024
0224d68
Update src/content/docs/ai-gateway/observability/evaluations/set-up-e…
daisyfaithauma Sep 25, 2024
e077d68
Added hyperlinks on RAG and providers
daisyfaithauma Sep 25, 2024
eced4ed
Removed Rerun details
daisyfaithauma Sep 25, 2024
dd6e3c2
Update index.mdx
kathayl Sep 26, 2024
51648db
Update set-up-evaluations.mdx
kathayl Sep 26, 2024
5c9ea82
Update src/content/docs/ai-gateway/observability/evaluations/set-up-e…
daisyfaithauma Sep 26, 2024
7f00f51
Update src/content/docs/ai-gateway/observability/evaluations/index.mdx
daisyfaithauma Sep 26, 2024
13a4e49
Update src/content/docs/ai-gateway/observability/evaluations/index.mdx
daisyfaithauma Sep 26, 2024
15e5f53
Update src/content/docs/ai-gateway/observability/evaluations/set-up-e…
daisyfaithauma Sep 26, 2024
7babd78
Update src/content/docs/ai-gateway/observability/evaluations/set-up-e…
daisyfaithauma Sep 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions src/content/docs/ai-gateway/observability/evaluations/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
title: Evaluations
pcx_content_type: navigation
order: 1
---

# Overview
daisyfaithauma marked this conversation as resolved.
Show resolved Hide resolved

Understanding your application’s performance is essential for optimizing and improving it. Developers often have different priorities, and finding the optimal solution involves balancing key factors such as cost, latency, and accuracy. For some, low-latency responses are critical, while others may prioritize accuracy or cost-efficiency.
daisyfaithauma marked this conversation as resolved.
Show resolved Hide resolved
daisyfaithauma marked this conversation as resolved.
Show resolved Hide resolved

AI Gateway’s Evaluations provides the data needed to make informed decisions on how to optimize your AI application. Whether it’s adjusting the model, provider, or prompt, this feature delivers insights into key metrics around performance, speed, and cost. It empowers developers to better understand their application’s behavior, ensuring improved accuracy, reliability, and customer satisfaction.
daisyfaithauma marked this conversation as resolved.
Show resolved Hide resolved

Datasets are collections of logs stored for analysis that can be used for an evaluation. You can create datasets by applying filters in the Logs or Evaluations tab, which helps narrow down specific logs for evaluation.

Our first step toward comprehensive AI evaluations starts with human-in-the-loop feedback (currently in open beta). Future updates will include automated scoring using large language models (LLMs), comparisons across multiple datasets, and prompt evaluations. These enhancements will help you make informed, data-driven decisions to ensure your applications are efficient and cost-effective.

[Learn how to set up an evaluation](https://developers.cloudflare.com/ai-gateway/observability/evaluations/set-up-evaluations/) including creating datasets, selecting evaluators, and running the evaluation process.
daisyfaithauma marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
pcx_content_type: how-to
title: Set up Evaluations
sidebar:
order: 2
---

This guide walks you through the process of setting up an evaluation in AI Gateway. These steps are done in the [Cloudflare dashboard](https://dash.cloudflare.com/).

## 1. Select or create a dataset

Datasets are collections of logs stored for analysis that can be used in an evaluation. You can create datasets from either the Logs tab or the Evaluations tab by applying filters.

### Set up a dataset from the Logs tab

1. Apply filters (such as x, y, or z) to narrow down your logs.
2. Select **Save** dataset to store the filtered logs for future analysis.

You can manage datasets by selecting Manage datasets from the Logs tab. From here, you can:

- Edit
- Update
- Delete
- Save a new version of a dataset

:::note[Note]

Datasets can currently only be created with one item per filter (e.g., one model, one provider). Future updates will allow more flexibility in dataset creation.
daisyfaithauma marked this conversation as resolved.
Show resolved Hide resolved

:::

## 2. Select evaluators

After creating a dataset, choose the evaluation parameters:

- Cost: Calculates the average cost of inference requests within the dataset (only for requests with [cost data](https://developers.cloudflare.com/ai-gateway/observability/costs/)).
daisyfaithauma marked this conversation as resolved.
Show resolved Hide resolved
- Speed: Calculates the average duration of inference requests within the dataset.
- Performance:
- Human feedback: Measures performance based on human feedback. Users can annotate logs with a thumbs-up or thumbs-down in the Logs tab. The evaluation calculates performance based on these annotations.

:::note[Note]

Additional evaluators will be introduced in future updates to expand performance analysis capabilities.

:::

## 3. Name, review, and run the evaluation

1. Create a unique name for your evaluation to reference it in the dashboard.
2. Review the selected dataset and evaluators.
3. Select **Run** evaluation to start the process.
daisyfaithauma marked this conversation as resolved.
Show resolved Hide resolved

## 4. Review and analyze results

Evaluation results will appear in the Evaluations tab. The results show the status of the evaluation (e.g., in progress, completed, or error). Metrics for the selected evaluators will be displayed, excluding any logs with missing fields. You will also see the number of logs used to calculate each metric.
daisyfaithauma marked this conversation as resolved.
Show resolved Hide resolved

Use these insights to adjust your setup and optimize based on your application's priorities. Based on the results, you may choose to:

- Change the model or provider
- Adjust your prompts
- Explore further optimizations, such as setting up Retrieval Augmented Generation (RAG)
Loading