Add latency metrics #1870

simon-mo · 2023-11-30T22:02:04Z

After #1662 (initial metrics support) and #1756 (refactoring chat endpoint), it will become practical to include latency metrics that's important to production (courtesy of @Yard1):

histogram of time to first token, and gauge of the mean, in ms
histogram of inter-token latency, and gauge of the mean, in ms
histogram of e2e time per request, and gauge of the mean, in ms
gauge of mean tokens per s per request. we currently only track the prefill and generation throughput, no request level.

A natural place to do it would be in the LLM engine or chat completion API, which ever one is less intrusive.

Yard1 · 2023-12-02T22:09:52Z

I would suggest placing them in the engine - it will be more generic that way.

robertgshaw2-neuralmagic · 2023-12-30T16:05:33Z

I am working on a PR for this

robertgshaw2-neuralmagic · 2024-01-01T22:37:44Z

first draft #2316

hmellor · 2024-03-28T13:36:16Z

#2764 looks to add a request level histogram of token throughput

grandiose-pizza · 2024-03-31T13:37:23Z

Hi,

@Yard1 @robertgshaw2-neuralmagic
I want to use the metrics. I have exposed an API using the api_server.py

When I do a http://localhost:8075/metrics/, I get the following instead of seeing the values as described in the Metrics Class, How to see those metrics? :

# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 6290.0
python_gc_objects_collected_total{generation="1"} 8336.0
python_gc_objects_collected_total{generation="2"} 4726.0
# HELP python_gc_objects_uncollectable_total Uncollectable objects found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 826.0
python_gc_collections_total{generation="1"} 75.0
python_gc_collections_total{generation="2"} 6.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="10",patchlevel="12",version="3.10.12"} 1.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 3.098353664e+010
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 7.31774976e+08
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.71188972784e+09
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 18.27
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 44.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06

hmellor · 2024-03-31T13:49:55Z

You need to make a request in order for the metrics to be populated

grandiose-pizza · 2024-03-31T13:56:03Z

I am making a request with curl command and then monitoring the /metrics end point.

But I can't see the metrics like this screenshot.

I think I may need to add something to api_serve.py to point to the metrics.py but unsure what.

hmellor · 2024-03-31T13:58:56Z

Are you curling either /v1/chat/completions or /v1/completions?

grandiose-pizza · 2024-03-31T14:00:14Z

Are you curling either /v1/chat/completions or /v1/completions?

Yes. /v1/chat/completions

curl -X 'POST' \
  'http://localhost:8075/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
  "messages": [
    {
      "role": "user",
      "content": "Write an essay on plastic use"
    }
  ],
  "model": "jais-30b",
  "stream": true
}'

hmellor · 2024-03-31T14:05:51Z

This debugging isn't really relevant to this thread, I'm going to move further discussion to #2850, where it is.

HarryWu99 · 2024-04-29T01:09:06Z

@hmellor hello, It seems that the discussion has moved to another place. Can this issue be closed？

simon-mo added help wanted Extra attention is needed good first issue Good for newcomers labels Nov 30, 2023

simon-mo mentioned this issue Dec 2, 2023

Add Production Metrics in Prometheus format #1890

Merged

robertgshaw2-neuralmagic mentioned this issue Jan 1, 2024

Refactor Prometheus and Add Request Level Metrics #2316

Merged

hmellor closed this as completed Mar 28, 2024

hmellor reopened this Mar 28, 2024

robertgshaw2-neuralmagic closed this as completed Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add latency metrics #1870

Add latency metrics #1870

simon-mo commented Nov 30, 2023

Yard1 commented Dec 2, 2023

robertgshaw2-neuralmagic commented Dec 30, 2023

robertgshaw2-neuralmagic commented Jan 1, 2024

hmellor commented Mar 28, 2024

grandiose-pizza commented Mar 31, 2024

hmellor commented Mar 31, 2024

grandiose-pizza commented Mar 31, 2024 •

edited

Loading

hmellor commented Mar 31, 2024 •

edited

Loading

grandiose-pizza commented Mar 31, 2024 •

edited

Loading

hmellor commented Mar 31, 2024

HarryWu99 commented Apr 29, 2024

Add latency metrics #1870

Add latency metrics #1870

Comments

simon-mo commented Nov 30, 2023

Yard1 commented Dec 2, 2023

robertgshaw2-neuralmagic commented Dec 30, 2023

robertgshaw2-neuralmagic commented Jan 1, 2024

hmellor commented Mar 28, 2024

grandiose-pizza commented Mar 31, 2024

hmellor commented Mar 31, 2024

grandiose-pizza commented Mar 31, 2024 • edited Loading

hmellor commented Mar 31, 2024 • edited Loading

grandiose-pizza commented Mar 31, 2024 • edited Loading

hmellor commented Mar 31, 2024

HarryWu99 commented Apr 29, 2024

grandiose-pizza commented Mar 31, 2024 •

edited

Loading

hmellor commented Mar 31, 2024 •

edited

Loading

grandiose-pizza commented Mar 31, 2024 •

edited

Loading