Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpc request may hang when error message is too large after bumping to tonic v0.12 #18039

Open
BugenZhao opened this issue Aug 14, 2024 · 2 comments
Assignees
Labels
type/bug Something isn't working

Comments

@BugenZhao
Copy link
Member

e2e-sink-test now consistently hangs here:

statement error test-rw-sink-upsert-avro-err-key
create sink sink_err from into_kafka with (
connector = 'kafka',
topic = 'test-rw-sink-upsert-avro-err',
properties.bootstrap.server = 'message_queue:29092',
primary_key = 'int32_field,string_field')
format upsert encode avro (
schema.registry = 'http://schemaregistry:8082');

I find that by disabling SCHEMA_REGISTRY_DEBUG here, this issue is gone.

SCHEMA_REGISTRY_DEBUG: 'true'

The only difference is that there won't be backtraces from the schema registry in the error message.

failed to validate sink: config error: all request confluent registry all timeout, req path ["subjects", "test-rw-sink-upsert-avro-err-key", "versions", "latest"], urls http://schemaregistry:8082/
	confluent schema registry error 40401: Subject 'test-rw-sink-upsert-avro-err-key' not found. io.confluent.rest.exceptions.RestNotFoundException: Subject 'test-rw-sink-upsert-avro-err-key' not found.
- io.confluent.rest.exceptions.RestNotFoundException: Subject 'test-rw-sink-upsert-avro-err-key' not found.
-	at io.confluent.kafka.schemaregistry.rest.exceptions.Errors.subjectNotFoundException(Errors.java:78)
-	at io.confluent.kafka.schemaregistry.rest.resources.SubjectVersionsResource.getSchemaByVersion(SubjectVersionsResource.java:154)
-	at jdk.internal.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)

Following this insight, I suppose it's because we always encode the ServerError in gRPC (HTTP2) headers (#13282), and there's an outstanding issue where tonic 0.12 will hang forever when the header size exceeds some limit.

let serialized = bincode::serialize(&source).unwrap();
let mut metadata = MetadataMap::new();
metadata.insert_bin(ERROR_KEY, MetadataValue::from_bytes(&serialized));

Upstream issues:

ATM there seems to be no fix. I'll disable SCHEMA_REGISTRY_DEBUG now as a workaround and open an issue for this.

Originally posted by @BugenZhao in #17889 (comment)

@BugenZhao BugenZhao added the type/bug Something isn't working label Aug 14, 2024
@github-actions github-actions bot added this to the release-2.0 milestone Aug 14, 2024
@BugenZhao
Copy link
Member Author

With this configuration exposed, we're able to workaround this issue:

hyperium/tonic#1835

Waiting for a new version to be released.

@BugenZhao
Copy link
Member Author

Workarounded with #18639

@BugenZhao BugenZhao removed this from the release-2.1 milestone Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant