Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

500 server error when using HTTP API cat and stat to node using s3 datastore #217

Open
shc261392 opened this issue Mar 14, 2022 · 6 comments
Assignees
Labels
need/analysis Needs further analysis before proceeding P2 Medium: Good to have, but can wait until someone steps up status/inactive No significant work in the previous month

Comments

@shc261392
Copy link

shc261392 commented Mar 14, 2022

Hi, I've met an issue using IPFS node built with the go-ds-s3 plugin. For now I can't be very sure about what the root cause is, but I'll provided as many information as possible, would be very appreciated if anyone could help.

Problem

When using the built IPFS node's HTTP API api/v0/cat and api/v0/stat, sometimes a 500 error (Internal Server Error) will be returned. So far we have tried but can't find a stable way to reproduce this issue. From what we've observed, calling the exactly same API endpoint will same headers and query parameters does not always meet this problem.

The same problem never happens on another IPFS node using the default datastore without this plugin, so we think the might be an issue related to the plugin.

Making API calls like api/v0/add or api/v0/get seems always works however.

EDIT: the problem also happens for api/v0/add

Logs that might be related to the problem

We have turned the IPFS daemon's debug flag --debug on, but it produces very verbose logs. Since I'm not so familiar with how to debug IPFS daemon, I can only list some error messages that I think might be related the problem. I can provide more detailed logs or look for some certain debugging messages if I know where to look at, please let me know if there's something else that might help.

Logs that we think might be related to the problem:

2022-03-10T10:39:43.034Z    ERROR   engine  decision/blockstoremanager.go:92    blockstore.GetSize(bafybeied5upazh2w4hwhmaoduhd7c2n37q2mil4afzp3ycyycxhjszo2oe) error: RequestError: send request failed
 caused by: Head "https://ipfs-s3-datastore.s3-accelerate.amazonaws.com/ipfs-s3-datastore/blocks/CIQIH3I6BSPVNYPMOYA4HIOH6FU3X7BUYQXYALS7XQFRQFOOTFS5U4I": dial tcp 99.84.182.227:443: connect: cannot       assign requested address
2022-03-10T10:39:43.036Z    ERROR   engine  decision/blockstoremanager.go:92    blockstore.GetSize(QmTVebXpZShgdBRtP7j8VnvX3HGbjjuUZotssXs39HnAyj) error: RequestError: send request failed
 caused by: Head "https://ipfs-s3-datastore.s3-accelerate.amazonaws.com/ipfs-s3-datastore/blocks/CIQEZGEIUOETNBK34TFYDMESR7IUMNH6Z2J6WSBE4GYJ4W7N26T6LJQ": dial tcp 99.84.182.227:443: connect: cannot       assign requested address
2022-03-10T10:39:43.350Z    DEBUG   dht go-libp2p-kad-dht@v0.15.0/query.go:513  error connecting: failed to dial 12D3KooWPwYnvy9zdTnBj31zAVrrdvLWNmBW91vnUhf3Ku2GCPhf:
   * [/ip6/2a02:908:2211:d100:ccbc:611c:3deb:a84d/tcp/54806] dial tcp6 [2a02:908:2211:d100:ccbc:611c:3deb:a84d]:54806: connect: cannot assign requested address
   * [/ip6/2a02:908:2211:d100:ccbc:611c:3deb:a84d/tcp/4001] dial tcp6 [2a02:908:2211:d100:ccbc:611c:3deb:a84d]:4001: connect: cannot assign requested address
   * [/ip6/::1/tcp/4001] dial tcp6 [::1]:4001: connect: cannot assign requested address
   * [/ip6/2a02:908:2211:d100:4b7:766b:dfe8:9037/tcp/4001] dial tcp6 [2a02:908:2211:d100:4b7:766b:dfe8:9037]:4001: connect: cannot assign requested address
   * [/ip6/::1/udp/4001/quic] INTERNAL_ERROR: write udp6 [::]:4001->[::1]:4001: sendto: cannot assign requested address
   * [/ip6/2a02:908:2211:d100:4b7:766b:dfe8:9037/udp/4001/quic] INTERNAL_ERROR: write udp6 [::]:4001->[2a02:908:2211:d100:4b7:766b:dfe8:9037]:4001: sendto: cannot assign requested address
   * [/ip6/2a02:908:2211:d100:ccbc:611c:3deb:a84d/udp/4001/quic] INTERNAL_ERROR: write udp6 [::]:4001->[2a02:908:2211:d100:ccbc:611c:3deb:a84d]:4001: sendto: cannot assign requested address
   * [/ip4/192.168.0.59/udp/4001/quic] context deadline exceeded
   * [/ip4/192.168.0.59/tcp/4001] dial tcp4 0.0.0.0:4001->192.168.0.59:4001: i/o timeout
   * [/ip4/37.201.146.171/tcp/54806] dial tcp4 0.0.0.0:4001->37.201.146.171:54806: i/o timeout
   * [/ip4/37.201.146.171/udp/45813/quic] timeout: no recent network activity
2022-03-10T10:39:43.701Z    DEBUG   dht net/message_manager.go:82   request failed to open message sender   {"error": "failed to dial 12D3KooWB3L8vLPpG6XHwPw3aFjpTYrKCNTniMw97sx8FU5MHqtM:\n  * [/ip6/::1/ tcp/4001] dial backoff\n  * [/ip6/2002:b6a2:68fa::b6a2:68fa/tcp/4001] dial backoff\n  * [/ip6/::1/udp/4001/quic] INTERNAL_ERROR: write udp6 [::]:4001->[::1]:4001: sendto: cannot assign requested          address\n  * [/ip6/2002:b6a2:68fa::b6a2:68fa/udp/4001/quic] INTERNAL_ERROR: write udp6 [::]:4001->[2002:b6a2:68fa::b6a2:68fa]:4001: sendto: cannot assign requested address\n  * [/ip4/182.162.104.250/udp/ 4001/quic] timeout: no recent network activity\n  * [/ip4/182.162.104.250/tcp/4001] dial tcp4 0.0.0.0:4001->182.162.104.250:4001: i/o timeout", "to":                                                       "12D3KooWB3L8vLPpG6XHwPw3aFjpTYrKCNTniMw97sx8FU5MHqtM"}
2022-03-10T10:39:45.818Z    DEBUG   dht go-libp2p-kad-dht@v0.15.0/query.go:513  error connecting: failed to dial 12D3KooWBCHsSNQhhBwp96vxYJX6HdYAtyFwAXiE2b4NoAx9Y43e:
   * [/ip6/::1/tcp/4001] dial tcp6 [::1]:4001: connect: cannot assign requested address
   * [/ip6/::1/udp/4001/quic] INTERNAL_ERROR: write udp6 [::]:4001->[::1]:4001: sendto: cannot assign requested address
   * [/ip4/61.4.101.79/udp/4001/quic] timeout: no recent network activity
   * [/ip4/61.4.101.79/tcp/4001] failed to negotiate security protocol: read tcp4 172.20.0.2:4001->61.4.101.79:4001: read: connection reset by peer

Environment

Environment Version
OS Ubuntu 20.04
Golang 1.17.6
go-ipfs 0.11.0
go-ds-s3 1ad440b
S3 Region us-east-1
S3 Region Endpoint S3 Acceleration Endpoint

The IPFS daemon is built from source with the plugin and run in a Docker container, Dockerfile link here.

Only one IPFS daemon is running, and the S3-bucket only serves the daemon, not being used by anything else.

@shc261392 shc261392 added the need/triage Needs initial labeling and prioritization label Mar 14, 2022
@welcome
Copy link

welcome bot commented Mar 14, 2022

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

@aeiou335
Copy link

api/v0/add also will return 500 internal error sometimes.

@guseggert
Copy link
Contributor

Is there a body to the response? If so it might have an error message in it. Also the error is logged if you turn on debug logging for the "cmds" system ipfs log level cmds debug.

@shc261392
Copy link
Author

There is only Internal Server Error message, no additional info.

After setting the cmds subsystem log to debug level, here's what I've found that could be relevant:

2022-03-18T04:24:18.494Z	DEBUG	cmds	go-ipfs-cmds@v0.6.0/command.go:145	error occured in call, closing with error: failed to get block for bafybeiaz5zrdvjtoqmibprkk6g7oxbdo2zmdxuu3j5agbaaczph7ylckjq: RequestError: send request failed
caused by: Get "https://ipfs-s3-datastore.s3-accelerate.amazonaws.com/ipfs-s3-datastore/blocks/AFYBEIAZ5ZRDVJTOQMIBPRKK6G7OXBDO2ZMDXUU3J5AGBAACZPH7YLCKJQ": dial tcp 13.32.196.208:443: connect: cannot assign requested address
IAWYXOAXCSAW5P2MA": dial tcp 13.32.196.208:443: connect: cannot assign requested address

@guseggert
Copy link
Contributor

guseggert commented Mar 18, 2022

Awesome thanks. This smells like ephemeral port exhaustion, could you check the number of TCP conns in TIME-WAIT state? ss -at | tail -n +2 | awk '{print $1}' | sort | uniq -c | sort -n

Although the fact that this error is also happening on both TCP and UDP swarm sockets suggests there may be something deeper wrong with your host, like misconfigured networking.

@shc261392
Copy link
Author

Here's what I get for running the above command:

$ ss -at | tail -n +2 | awk '{print $1}' | sort | uniq -c | sort -n
     16 LISTEN
    454 ESTAB
    503 TIME-WAIT

The node is running on an EC2 instance in the same region as the S3 bucket, the only network-related config I've added is allowing HTTP and HTTPS inbound traffic in security group which is probably not related to the issue. I'm not sure what should I check next.

@guseggert guseggert removed the need/triage Needs initial labeling and prioritization label Sep 15, 2022
@guseggert guseggert added P2 Medium: Good to have, but can wait until someone steps up need/analysis Needs further analysis before proceeding status/inactive No significant work in the previous month labels Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/analysis Needs further analysis before proceeding P2 Medium: Good to have, but can wait until someone steps up status/inactive No significant work in the previous month
Projects
None yet
Development

No branches or pull requests

3 participants