Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Prometheus and Add Request Level Metrics #2316

Merged
merged 52 commits into from
Jan 31, 2024
Merged
Show file tree
Hide file tree
Changes from 42 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
56d398b
added first refactor of metrics
robertgshaw2-neuralmagic Dec 30, 2023
2239e73
refactored to use counters rather than gauges for monotonically incre…
robertgshaw2-neuralmagic Dec 30, 2023
1e6ad74
added dev notebook, started running live
robertgshaw2-neuralmagic Dec 30, 2023
10d5353
first example where things did not completely break :)!
robertgshaw2-neuralmagic Dec 31, 2023
5199cdd
end to end things seem to be working
robertgshaw2-neuralmagic Dec 31, 2023
f69f639
logging properly to prom, seeing the metrics come up
robertgshaw2-neuralmagic Dec 31, 2023
0c24fc3
added full example setting up prom/grafana logging, including default…
robertgshaw2-neuralmagic Dec 31, 2023
874df77
removed local logging
robertgshaw2-neuralmagic Dec 31, 2023
63aecbc
stashing refactor to stateless loggers
robertgshaw2-neuralmagic Jan 1, 2024
7782baf
refactored code to support stateless iteration logging; this made eve…
robertgshaw2-neuralmagic Jan 1, 2024
92dda00
missed metrics file :)
robertgshaw2-neuralmagic Jan 1, 2024
6e7f715
made seq_group implementation simplier
robertgshaw2-neuralmagic Jan 1, 2024
67aaed7
updated metrics page ordering
robertgshaw2-neuralmagic Jan 1, 2024
69093b2
updated formatting / type checking
robertgshaw2-neuralmagic Jan 4, 2024
114a4c9
updated api server to support /metrics so I could run performance ben…
robertgshaw2-neuralmagic Jan 4, 2024
f68f4f7
Update async_llm_engine.py
robertgshaw2-neuralmagic Jan 4, 2024
ce0534f
quality
robertgshaw2-neuralmagic Jan 5, 2024
05b3206
Merge branch 'vllm-project:main' into rs/feature/metrics
robertgshaw2-neuralmagic Jan 5, 2024
450dfc2
cleaned up to use only one Stat type; added other metric
robertgshaw2-neuralmagic Jan 5, 2024
0e65765
quality
robertgshaw2-neuralmagic Jan 5, 2024
9fee85f
Update outputs.py
robertgshaw2-neuralmagic Jan 5, 2024
9cdd6c4
reverted changes to api_server.py
robertgshaw2-neuralmagic Jan 5, 2024
d1dcac6
removed line to match base
robertgshaw2-neuralmagic Jan 5, 2024
a42c3ca
stash to move to other machine
robertgshaw2-neuralmagic Jan 6, 2024
e2207db
factored per simon-mo request
robertgshaw2-neuralmagic Jan 6, 2024
a90d447
readded files
robertgshaw2-neuralmagic Jan 6, 2024
567d32f
fixed bugs in initial version
robertgshaw2-neuralmagic Jan 6, 2024
f363df7
e2e functional testing complete
robertgshaw2-neuralmagic Jan 6, 2024
519581b
readded images
robertgshaw2-neuralmagic Jan 6, 2024
551a3c0
quality
robertgshaw2-neuralmagic Jan 6, 2024
1f38f15
Merge branch 'main' into rs/feature/metrics
robertgshaw2-neuralmagic Jan 7, 2024
32d2259
fixed merge issue
robertgshaw2-neuralmagic Jan 7, 2024
14f36ae
Update grafana.json
robertgshaw2-neuralmagic Jan 7, 2024
6a81d89
updated with simplier more direct implementation
robertgshaw2-neuralmagic Jan 20, 2024
3a28d48
smoke test confirms changes are working
robertgshaw2-neuralmagic Jan 20, 2024
9c465c7
Merge branch 'main' into rs/feature/metrics
robertgshaw2-neuralmagic Jan 20, 2024
d800e7f
simplified example
robertgshaw2-neuralmagic Jan 20, 2024
f117808
format
robertgshaw2-neuralmagic Jan 20, 2024
30e88e4
Update benchmark_serving.py
robertgshaw2-neuralmagic Jan 20, 2024
6bbba50
Update llm_engine.py
robertgshaw2-neuralmagic Jan 20, 2024
3cff058
Update README.md
robertgshaw2-neuralmagic Jan 20, 2024
38578d3
Update README.md
robertgshaw2-neuralmagic Jan 20, 2024
629e1d3
Update examples/production_monitoring/README.md
robertgshaw2-neuralmagic Jan 24, 2024
cef0432
Update examples/production_monitoring/README.md
robertgshaw2-neuralmagic Jan 24, 2024
dc4eaa5
Update vllm/engine/llm_engine.py
robertgshaw2-neuralmagic Jan 24, 2024
d517924
Update vllm/sequence.py
robertgshaw2-neuralmagic Jan 24, 2024
0b726c5
fixes simon's concerns and validates working properly. renames metric…
robertgshaw2-neuralmagic Jan 26, 2024
9b76d60
Merge branch 'main' into rs/feature/metrics
robertgshaw2-neuralmagic Jan 26, 2024
6fed96c
format
robertgshaw2-neuralmagic Jan 26, 2024
7f1379b
new line
robertgshaw2-neuralmagic Jan 26, 2024
3c18cb5
new line
robertgshaw2-neuralmagic Jan 26, 2024
6b9afa2
confirmed everything is working e2e
robertgshaw2-neuralmagic Jan 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions examples/production_monitoring/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# vLLM + Prometheus/Grafana

This is a simple example that shows you how to connect vLLM metric logging to the Prometheus/Grafana stack.

For this example, we launch Prometheus and Grafana via Docker. Install:
robertgshaw2-neuralmagic marked this conversation as resolved.
Show resolved Hide resolved
- [`docker`](https://docs.docker.com/engine/install/)
- [`docker compose`](https://docs.docker.com/compose/install/linux/#install-using-the-repository)

### Launch

Prometheus metric logging is enabled by default in the OpenAI-compatible server. Launch via the entrypoint:
```bash
python3 ../../vllm/entrypoints/openai/api_server.py \
robertgshaw2-neuralmagic marked this conversation as resolved.
Show resolved Hide resolved
--model mistralai/Mistral-7B-v0.1 \
--max-model-len 2048 \
--disable-log-requests
```

Launch Prometheus and Grafana servers with `docker compose`:
```bash
docker compose up
```

Submit some sample requests to the server:
```bash
wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
python3 ../../benchmarks/benchmark_serving.py \
--model mistralai/Mistral-7B-v0.1 \
--tokenizer mistralai/Mistral-7B-v0.1 \
--endpoint /v1/completions \
--dataset ShareGPT_V3_unfiltered_cleaned_split.json \
--num-prompts 200 \
--request-rate 3.0
```

Navigating to [`http://localhost:8000/metrics`](http://localhost:8000/metrics) will show the raw Prometheus metrics being exposed by vLLM.

### Grafana Dashboard

Navigate to [`http://localhost:3000`](http://localhost:3000). Log in with the default username (`admin`) and password (`admin`).

#### Add Prometheus Data Source

Navigate to [`http://localhost:3000/connections/datasources/new`](http://localhost:3000/connections/datasources/new) and select Prometheus.

On Prometheus configuration page, we need to add the `Prometheus Server URL` in `Connection`. For this setup, Grafana and Prometheus are running in separate containers, so we need the IP address of the Prometheus container. Run the following to lookup the name of your Prometheus container:

```bash
docker container ls

>> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
>> 6b2eb9d7aa99 grafana/grafana:latest "/run.sh" 45 minutes ago Up 45 minutes 0.0.0.0:3000->3000/tcp, :::3000->3000/tcp production_monitoring-grafana-1
>> d9b32bc6a02b prom/prometheus:latest "/bin/prometheus --c…" 45 minutes ago Up 45 minutes 0.0.0.0:9090->9090/tcp, :::9090->9090/tcp production_monitoring-prometheus-1
```

Run the following to lookup the IP address (replace `production_monitoring-prometheus-1` with your container name):
```bash
docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' production_monitoring-prometheus-1
>> 172.18.0.2
```

So, in our case, the `Prometheus Server URL` should be: `http://172.18.0.2:9090`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
On Prometheus configuration page, we need to add the `Prometheus Server URL` in `Connection`. For this setup, Grafana and Prometheus are running in separate containers, so we need the IP address of the Prometheus container. Run the following to lookup the name of your Prometheus container:
```bash
docker container ls
>> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
>> 6b2eb9d7aa99 grafana/grafana:latest "/run.sh" 45 minutes ago Up 45 minutes 0.0.0.0:3000->3000/tcp, :::3000->3000/tcp production_monitoring-grafana-1
>> d9b32bc6a02b prom/prometheus:latest "/bin/prometheus --c…" 45 minutes ago Up 45 minutes 0.0.0.0:9090->9090/tcp, :::9090->9090/tcp production_monitoring-prometheus-1
```
Run the following to lookup the IP address (replace `production_monitoring-prometheus-1` with your container name):
```bash
docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' production_monitoring-prometheus-1
>> 172.18.0.2
```
So, in our case, the `Prometheus Server URL` should be: `http://172.18.0.2:9090`.
On Prometheus configuration page, we need to add the `Prometheus Server URL` in `Connection`. For this setup, Grafana and Prometheus are running in separate containers, but Docker creates DNS name for each containers. You can just use `http://prometheus:9090`.

See https://docs.docker.com/compose/networking/

Copy link
Sponsor Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol this is so much better

Copy link
Sponsor Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ill just double check this works


Click `Save & Test`. You should get a green check saying "Successfully queried the Prometheus API.".

#### Import Dashboard

Navigate to [`http://localhost:3000/dashboard/import`](http://localhost:3000/dashboard/import), upload `grafana.json`, and select the `prometheus` datasource.

You should see a screen that looks like the following:

![Grafana Dashboard Image](images/vllm-grafana-dashboard.png)
robertgshaw2-neuralmagic marked this conversation as resolved.
Show resolved Hide resolved
19 changes: 19 additions & 0 deletions examples/production_monitoring/docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# docker-compose.yaml
version: "3"

services:
prometheus:
image: prom/prometheus:latest
extra_hosts:
- "host.docker.internal:host-gateway" # allow a direct connection from container to the local machine
ports:
- "9090:9090" # the default port used by Prometheus
volumes:
- ${PWD}/prometheus.yaml:/etc/prometheus/prometheus.yml # mount Prometheus config file

grafana:
image: grafana/grafana:latest
depends_on:
- prometheus
ports:
- "3000:3000" # the default port used by Grafana
robertgshaw2-neuralmagic marked this conversation as resolved.
Show resolved Hide resolved
Loading
Loading