Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zipkin Post from Envoy 1.13.0 to otel-collector sometime fails because of timestamp value format #572

Closed
marcantoine-bibeau opened this issue Feb 25, 2020 · 5 comments · Fixed by #1446
Milestone

Comments

@marcantoine-bibeau
Copy link

Envoy 1.13 now only supports zipkin v2 api but sometimes send an invalid payload related to the timestamp using this format: 1.58266276796031e+15. There is already an open issue on Envoy side envoyproxy/envoy#9341 with discussion on Go side to support these kind of value or not... Can this case be handled in otel-collector? Could wait for a long time for Envoy fix...

Request:
POST /api/v2/spans HTTP/1.1
host: tracing_zipkin_cluster
content-type: application/json
x-envoy-internal: true
x-forwarded-for: 10.1.72.4
x-envoy-expected-rq-timeout-ms: 5000
transfer-encoding: chunked
[{"duration":2741,"kind":"SERVER","localEndpoint":{"port":0,"serviceName":"echocall1","ipv4":"10.1.72.4"},"id":"3691a8bd8b50c194","tags":{"upstream_cluster":"h1_ingress_cluster","guid:x-request-id":"e7761137-f1a0-9cae-a4b4-b7c1855d7c52","http.protocol":"HTTP/1.1","component":"proxy","http.url":"http://echocall1.testmgmt.devappdirect.me/echo","user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.116 Safari/537.36","downstream_cluster":"-","request_size":"0","response_flags":"-","response_size":"11","http.status_code":"200","http.method":"GET"},"shared":true,"traceId":"3691a8bd8b50c194","name":"something","timestamp":1.58266276796031e+15}]

Response:
HTTP/1.1 400 Bad Request
Content-Type: text/plain; charset=utf-8
X-Content-Type-Options: nosniff
Date: Tue, 25 Feb 2020 20:32:51 GMT
Content-Length: 98
json: cannot unmarshal number 1.58266276796031e+15 into Go struct field .timestamp of type uint64

@pjanotti
Copy link
Contributor

Hi @marcantoine-bibeau, isn't it easier to switch Envoy to emit Jaeger or another format? The collector still can send it to a Zipkin backend.

@owais
Copy link
Contributor

owais commented Feb 26, 2020

@pjanotti Yes, that's what we already decided to do but thought it was still worth it to report the issue here for zipkin receiver.

@marcantoine-bibeau
Copy link
Author

Argg, seems like there's always something!!! Jeager native tracing is broken with Envoy 13 :( see envoyproxy/envoy#9849

@marcantoine-bibeau
Copy link
Author

FYI, I have created a pr on zipkin-go to handle scientific notation numbers: openzipkin/zipkin-go#161

once this is merged and new release created collector would only have to update zipkin-go version

@marcantoine-bibeau
Copy link
Author

So zipkin-go rejected the change... so only have to wait for Envoy fix :(

@flands flands added this to the Beta 0.4.2 milestone Jul 6, 2020
@flands flands modified the milestones: Beta 0.6.0, Beta 0.7.0 Jul 15, 2020
@bogdandrutu bogdandrutu modified the milestones: Beta 0.7.0, Beta 0.8.0 Jul 30, 2020
MovieStoreGuy pushed a commit to atlassian-forks/opentelemetry-collector that referenced this issue Nov 11, 2021
The `checkpoint` function is executed in a single thread so we can do
the encoding lazily before passing the encoded version of labels to
the exporter. This is a cheap and quick way to avoid encoding the
labels on every collection interval.

Co-authored-by: Rahul Patel <rahulpa@google.com>
hughesjj pushed a commit to hughesjj/opentelemetry-collector that referenced this issue Apr 27, 2023
)

* Set required URL and TOKEN env vars for agent config

If you deploy Splunk OTel Connector outside of installer script then the
gateway configuration is used. If the agent configuration was desired,
it was really hard to switch:

```
$ > SPLUNK_ACCESS_TOKEN=1234 SPLUNK_REALM=us0 SPLUNK_CONFIG=../cmd/otelcol/config/collector/agent_config.yaml ./otelcol_darwin_amd64
2021/07/23 11:30:17 main.go:198: Set config to ../cmd/otelcol/config/collector/agent_config.yaml
2021/07/23 11:30:17 main.go:283: Set ballast to 168 MiB
2021/07/23 11:30:17 main.go:307: Set memory limit to 460 MiB
2021-07-23T11:30:17.907-0400	info	service/collector.go:283	Starting otelcol...	{"Version": "v0.29.0-43-g491b2f0", "NumCPU": 12}
2021-07-23T11:30:17.910-0400	info	service/collector.go:343	Using memory ballast	{"MiBs": 168}
2021-07-23T11:30:17.910-0400	info	service/collector.go:188	Setting up own telemetry...
2021-07-23T11:30:17.919-0400	info	service/telemetry.go:99	Serving Prometheus metrics	{"address": ":8888", "level": 0, "service.instance.id": "c928a31c-d214-4287-b7bb-d2b802138d1c"}
2021-07-23T11:30:17.919-0400	info	service/collector.go:224	Loading configuration...
2021-07-23T11:30:17.939-0400	info	service/collector.go:240	Applying configuration...
Error: cannot build extensions: cannot build builtExtensions: failed to create extension http_forwarder: 'egress.endpoint' config option cannot be empty
2021/07/23 11:30:17 main.go:94: application run finished with error: cannot build extensions: cannot build builtExtensions: failed to create extension http_forwarder: 'egress.endpoint' config option cannot be empty
```

To make it work:

```
SPLUNK_ACCESS_TOKEN=1234 SPLUNK_REALM=us0 SPLUNK_CONFIG=../cmd/otelcol/config/collector/agent_config.yaml \
  SPLUNK_API_URL=https://api.us0.signalfx.com SPLUNK_INGEST_URL=https://ingest.us0.signalfx.com \
  SPLUNK_TRACE_URL=https://ingest.us0.signalfx.com/v2/trace SPLUNK_HEC_URL=https://ingest.us0.signalfx.com/v1/log\
  ./otelcol_darwin_amd64
```

This is awful customer experience. The behavior is not surprising as the
agent config needs to optimize for both direct to SaaS and via gateway
data routing. The installer script mitigates this today.

The good news is that this is straightforward to address. Given
`SPLUNK_REALM` is required and the SaaS URLs are known and same for
`SPLUNK_ACCESS_TOKEN` we can set the required environment variable if
not manually specified.

* Abstract logic

* Fix lint

* Switch to LookupEnv

* Ensure prereq env vars are set
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants