Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more Prometheus metrics #2764

Merged
merged 50 commits into from
Apr 28, 2024
Merged
Show file tree
Hide file tree
Changes from 44 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
319bc37
Add vllm:request_max_tokens
ronensc Jan 31, 2024
9d4ce95
Remove trailing dots from comments that are not sentences
ronensc Jan 31, 2024
ae7eb6e
Add vllm:request_success
ronensc Jan 31, 2024
2188daa
Remove redundant space
ronensc Jan 31, 2024
e41c15f
Add vllm:request_n
ronensc Jan 31, 2024
71ec7c3
Add vllm:prompt_tokens
ronensc Feb 5, 2024
45bd839
Add vllm:generation_tokens
ronensc Feb 5, 2024
f237c50
Add comments
ronensc Feb 5, 2024
f17a966
Rename metrics
ronensc Feb 5, 2024
8e0d8c1
Make type hint compatible with python 3.8
ronensc Feb 7, 2024
9ed04ef
Rename metrics
ronensc Feb 12, 2024
6aebd80
Merge branch 'main' into more-metrics
ronensc Feb 19, 2024
de84dac
Merge branch 'main' into more-metrics
ronensc Feb 21, 2024
35944cc
Merge branch 'main' into more-metrics
ronensc Feb 26, 2024
76cd774
Consider the value of `max_model_len` when building buckets
ronensc Feb 26, 2024
93b0796
Merge branch 'main' into more-metrics
ronensc Mar 4, 2024
3643e0c
Merge branch 'main' into more-metrics
ronensc Mar 13, 2024
60f1049
Fix too long line warning
ronensc Mar 13, 2024
95daee7
Add HTTP metrics from prometheus-fastapi-instrumentator
ronensc Mar 26, 2024
cf4acef
Merge remote-tracking branch 'origin/main' into more-metrics
ronensc Mar 26, 2024
0f8dae9
Make ruff happy
ronensc Mar 26, 2024
bce096c
Remove vllm:request_params_max_tokens
ronensc Mar 28, 2024
e15f653
Move deprecated metrics to legacy section
ronensc Mar 29, 2024
7b05baa
Add metric vllm:request_params_best_of
ronensc Apr 1, 2024
0958259
Revert to exposing /metrics using make_asgi_app()
ronensc Apr 1, 2024
5e2c246
Register 'finished_reason' label name on metric creation
ronensc Apr 1, 2024
5cc7b64
Merge branch 'main' into more-metrics
ronensc Apr 1, 2024
1eeb31d
Fix merge issues
ronensc Apr 1, 2024
4c79cbe
Merge branch 'main' into more-metrics
ronensc Apr 17, 2024
4c41a89
Fix merge issues
ronensc Apr 17, 2024
ac8435b
Add 3 panels to Grafana dashboard
ronensc Apr 17, 2024
f22abf5
Change order of deprecated metrics and add comments
ronensc Apr 19, 2024
9352ce7
Rename LABEL_NAME_FINISHED_REASON and make it a class variable of Met…
ronensc Apr 19, 2024
b2c0445
Set minimum version to prometheus-fastapi-instrumentator
ronensc Apr 19, 2024
e147575
Change finished_reason from counter to list
ronensc Apr 19, 2024
f9bc64e
Compute deprecated metrics using the newer version
ronensc Apr 19, 2024
5ded719
Rename variables. Strip '_lst' suffix.
ronensc Apr 19, 2024
dd84d51
Update naming schema Stats to have the _suffix pattern
ronensc Apr 19, 2024
e127a4c
Fix the incorrect logic for chunked prefill
ronensc Apr 19, 2024
2d36609
Restore num_prompt_tokens_iter and num_generation_tokens_iter
ronensc Apr 19, 2024
e81d95a
Refactor metrics logging methods
ronensc Apr 19, 2024
ece2ec0
Reorder metrics definition to match Stats order
ronensc Apr 19, 2024
5a658c8
Rename metric variables to match suffix convention
ronensc Apr 19, 2024
717b559
Make mypy happy
ronensc Apr 20, 2024
61fad41
Merge branch 'main' into more-metrics
robertgshaw2-neuralmagic Apr 25, 2024
f103ad8
./format
robertgshaw2-neuralmagic Apr 25, 2024
bf1a0c4
Merge branch 'main' into more-metrics
robertgshaw2-neuralmagic Apr 28, 2024
cc0d5eb
fixed chunked prefill logic
robertgshaw2-neuralmagic Apr 28, 2024
d7f493b
make linter happy
robertgshaw2-neuralmagic Apr 28, 2024
54bf260
fixed issues with chunked prefill X metrics
robertgshaw2-neuralmagic Apr 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
283 changes: 283 additions & 0 deletions examples/production_monitoring/grafana.json
Original file line number Diff line number Diff line change
Expand Up @@ -873,6 +873,289 @@
],
"title": "Cache Utilization",
"type": "timeseries"
},
{
"type": "heatmap",
"title": "Request Prompt Length",
"description": "Heatmap of request prompt length",
"gridPos": {
"x": 0,
"y": 24,
"w": 12,
"h": 8
},
"datasource": {
"uid": "prometheus",
"type": "prometheus"
},
"id": 12,
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"refId": "A",
"expr": "sum by(le) (increase(vllm:request_prompt_tokens_bucket{model_name=\"$model_name\"}[$__rate_interval]))",
"range": true,
"instant": false,
"editorMode": "builder",
"legendFormat": "{{le}}",
"useBackend": false,
"disableTextWrap": false,
"fullMetaSearch": false,
"includeNullMetadata": true,
"format": "heatmap"
}
],
"options": {
"calculate": false,
"yAxis": {
"axisPlacement": "left",
"reverse": false,
"unit": "none",
"axisLabel": "Prompt Length"
},
"rowsFrame": {
"layout": "auto",
"value": "Request count"
},
"color": {
"mode": "scheme",
"fill": "dark-orange",
"scale": "exponential",
"exponent": 0.5,
"scheme": "Spectral",
"steps": 64,
"reverse": false,
"min": 0
},
"cellGap": 1,
"filterValues": {
"le": 1e-9
},
"tooltip": {
"show": true,
"yHistogram": true
},
"legend": {
"show": true
},
"exemplars": {
"color": "rgba(255,0,255,0.7)"
},
"cellValues": {
"unit": "none"
}
},
"fieldConfig": {
"defaults": {
"custom": {
"scaleDistribution": {
"type": "linear"
},
"hideFrom": {
"tooltip": false,
"viz": false,
"legend": false
}
}
},
"overrides": []
},
"pluginVersion": "10.2.0"
},
{
"datasource": {
"uid": "prometheus",
"type": "prometheus"
},
"type": "heatmap",
"title": "Request Generation Length",
"description": "Heatmap of request generation length",
"gridPos": {
"x": 12,
"y": 24,
"w": 12,
"h": 8
},
"id": 13,
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"refId": "A",
"expr": "sum by(le) (increase(vllm:request_generation_tokens_bucket{model_name=\"$model_name\"}[$__rate_interval]))",
"range": true,
"instant": false,
"editorMode": "builder",
"legendFormat": "{{le}}",
"useBackend": false,
"disableTextWrap": false,
"fullMetaSearch": false,
"includeNullMetadata": true,
"format": "heatmap"
}
],
"options": {
"calculate": false,
"yAxis": {
"axisPlacement": "left",
"reverse": false,
"unit": "none",
"axisLabel": "Generation Length"
},
"rowsFrame": {
"layout": "auto",
"value": "Request count"
},
"color": {
"mode": "scheme",
"fill": "dark-orange",
"scale": "exponential",
"exponent": 0.5,
"scheme": "Spectral",
"steps": 64,
"reverse": false,
"min": 0
},
"cellGap": 1,
"filterValues": {
"le": 1e-9
},
"tooltip": {
"show": true,
"yHistogram": true
},
"legend": {
"show": true
},
"exemplars": {
"color": "rgba(255,0,255,0.7)"
},
"cellValues": {
"unit": "none"
}
},
"fieldConfig": {
"defaults": {
"custom": {
"scaleDistribution": {
"type": "linear"
},
"hideFrom": {
"tooltip": false,
"viz": false,
"legend": false
}
}
},
"overrides": []
},
"pluginVersion": "10.2.0"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"custom": {
"drawStyle": "line",
"lineInterpolation": "linear",
"barAlignment": 0,
"lineWidth": 1,
"fillOpacity": 0,
"gradientMode": "none",
"spanNulls": false,
"insertNulls": false,
"showPoints": "auto",
"pointSize": 5,
"stacking": {
"mode": "none",
"group": "A"
},
"axisPlacement": "auto",
"axisLabel": "",
"axisColorMode": "text",
"axisBorderShow": false,
"scaleDistribution": {
"type": "linear"
},
"axisCenteredZero": false,
"hideFrom": {
"tooltip": false,
"viz": false,
"legend": false
},
"thresholdsStyle": {
"mode": "off"
}
},
"color": {
"mode": "palette-classic"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 32
},
"id": 11,
"options": {
"tooltip": {
"mode": "single",
"sort": "none"
},
"legend": {
"showLegend": true,
"displayMode": "list",
"placement": "bottom",
"calcs": []
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"disableTextWrap": false,
"editorMode": "builder",
"expr": "sum by(finished_reason) (increase(vllm:request_success_total{model_name=\"$model_name\"}[$__rate_interval]))",
"fullMetaSearch": false,
"includeNullMetadata": true,
"instant": false,
"interval": "",
"legendFormat": "__auto",
"range": true,
"refId": "A",
"useBackend": false
}
],
"title": "Finish Reason",
"description": "Number of finished requests by their finish reason: either an EOS token was generated or the max sequence length was reached.",
"type": "timeseries"
}
],
"refresh": "",
Expand Down
1 change: 1 addition & 0 deletions requirements-common.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ fastapi
uvicorn[standard]
pydantic >= 2.0 # Required for OpenAI server.
prometheus_client >= 0.18.0
prometheus-fastapi-instrumentator >= 7.0.0
tiktoken == 0.6.0 # Required for DBRX tokenizer
lm-format-enforcer == 0.9.3
outlines == 0.0.34 # Requires torch >= 2.1.0
Expand Down
2 changes: 1 addition & 1 deletion vllm/core/scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -321,7 +321,7 @@ def abort_seq_group(self, request_id: Union[str, Iterable[str]]) -> None:
for seq_group in state_queue:
if not request_ids:
# Using 'break' here may add two extra iterations,
# but is acceptable to reduce complexity .
# but is acceptable to reduce complexity.
break
if seq_group.request_id in request_ids:
# Appending aborted group into pending list.
Expand Down
Loading
Loading