cloudflare · daisyfaithauma · Sep 24, 2024 · Sep 24, 2024 · Sep 24, 2024 · Sep 24, 2024
@@ -0,0 +1,17 @@
+---
+title: Evaluations
+pcx_content_type: navigation
+order: 1
+---
+
+# Overview
+
+Understanding your application’s performance is essential for optimizing and improving it. Developers often have different priorities, and finding the optimal solution involves balancing key factors such as cost, latency, and accuracy. For some, low-latency responses are critical, while others may prioritize accuracy or cost-efficiency.
+
+AI Gateway’s Evaluations provides the data needed to make informed decisions on how to optimize your AI application. Whether it’s adjusting the model, provider, or prompt, this feature delivers insights into key metrics around performance, speed, and cost. It empowers developers to better understand their application’s behavior, ensuring improved accuracy, reliability, and customer satisfaction.
+
+Datasets are collections of logs stored for analysis that can be used for an evaluation. You can create datasets by applying filters in the Logs or Evaluations tab, which helps narrow down specific logs for evaluation.
+
+Our first step toward comprehensive AI evaluations starts with human-in-the-loop feedback (currently in open beta). Future updates will include automated scoring using large language models (LLMs), comparisons across multiple datasets, and prompt evaluations. These enhancements will help you make informed, data-driven decisions to ensure your applications are efficient and cost-effective.
+
+[Learn how to set up an evaluation](https://developers.cloudflare.com/ai-gateway/observability/evaluations/set-up-evaluations/) including creating datasets, selecting evaluators, and running the evaluation process.
@@ -0,0 +1,61 @@
+---
+pcx_content_type: how-to
+title: Set up Evaluations
+sidebar:
+  order: 2
+---
+
+This guide walks you through the process of setting up an evaluation in AI Gateway. These steps are done in the [Cloudflare dashboard](https://dash.cloudflare.com/).
+
+## 1. Select or create a dataset
+
+Datasets are collections of logs stored for analysis that can be used in an evaluation. You can create datasets from either the Logs tab or the Evaluations tab by applying filters.
+
+### Set up a dataset from the Logs tab
+
+1. Apply filters (such as x, y, or z) to narrow down your logs.
+2. Select **Save** dataset to store the filtered logs for future analysis.
+
+You can manage datasets by selecting Manage datasets from the Logs tab. From here, you can:
+
+- Edit
+- Update
+- Delete
+- Save a new version of a dataset
+
+:::note[Note]
+
+Datasets can currently only be created with one item per filter (e.g., one model, one provider). Future updates will allow more flexibility in dataset creation.
+
+:::
+
+## 2. Select evaluators
+
+After creating a dataset, choose the evaluation parameters:
+
+- Cost: Calculates the average cost of inference requests within the dataset (only for requests with [cost data](https://developers.cloudflare.com/ai-gateway/observability/costs/)).
+- Speed: Calculates the average duration of inference requests within the dataset.
+- Performance:
+  - Human feedback: Measures performance based on human feedback. Users can annotate logs with a thumbs-up or thumbs-down in the Logs tab. The evaluation calculates performance based on these annotations.
+
+:::note[Note]
+
+Additional evaluators will be introduced in future updates to expand performance analysis capabilities.
+
+:::
+
+## 3. Name, review, and run the evaluation
+
+1. Create a unique name for your evaluation to reference it in the dashboard.
+2. Review the selected dataset and evaluators.
+3. Select **Run** evaluation to start the process.
+
+## 4. Review and analyze results
+
+Evaluation results will appear in the Evaluations tab. The results show the status of the evaluation (e.g., in progress, completed, or error). Metrics for the selected evaluators will be displayed, excluding any logs with missing fields. You will also see the number of logs used to calculate each metric.
+
+Use these insights to adjust your setup and optimize based on your application's priorities. Based on the results, you may choose to:
+
+- Change the model or provider
+- Adjust your prompts
+- Explore further optimizations, such as setting up Retrieval Augmented Generation (RAG)