Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error: [Errno 11] Resource temporarily unavailable - gunicorn and flask #514

Closed
thomaswhitcomb opened this issue Apr 21, 2013 · 32 comments
Closed

Comments

@thomaswhitcomb
Copy link

Related to #416

I'm running flask, gunicorn and am getting the exception: [Errno 11] Resource temporarily unavailable exception in the @app.teardown_request

I'm running these versions: Flask==0.9 and gunicorn==0.17.2
Here is my stack trace:
Traceback (most recent call last):
File "/home/tom/venv/local/lib/python2.7/site-packages/gunicorn/workers/sync.py", line 39, in run
client, addr = sock.accept()
File "/usr/lib/python2.7/socket.py", line 202, in accept
sock, addr = self._sock.accept()

@thomaswhitcomb
Copy link
Author

I can easily recreate it.

@benoitc
Copy link
Owner

benoitc commented Apr 21, 2013

How are you running gunicorn?

@thomaswhitcomb
Copy link
Author

I run it from a command line as "gunicorn package:app".

I tested it on both osx and ubuntu and got the same error on both platforms.

@benoitc
Copy link
Owner

benoitc commented Apr 21, 2013

Do you have any sample app I can test?

@thomaswhitcomb
Copy link
Author

I was testing my entire app via the simple flask web server (app.run). I then switched over to gunicorn once I had the app working and found the problem (erro 11...). I stripped the app down to the simplest hello world and it still fails under gunicorn. I must be missing something very obvious? Here is the app below.


import traceback
from flask import Flask

app = Flask(name)

@app.route("/")
def hello_world():
return "hello world"

@app.teardown_request
def teardown_request(exception):
if exception == None:
print "ok"
else:
print(traceback.format_exc()


Fails everytime.

gunicorn --version => gunicorn (version 0.17.2)
flask 0.9
uname -a
Linux toms 3.2.0-40-generic-pae #64-Ubuntu SMP Mon Mar 25 21:44:41 UTC 2013 i686 i686 i386 GNU/Linux

@benoitc
Copy link
Owner

benoitc commented Apr 21, 2013

Thanks for the example. This error is expected [1] and ignored in gunicorn. You will notice that the request is correctly handled and finish,ie. no failure.

What happen is that on some fast and quite idle systems the accept is taken by more than one worker for the same client (which is expected). The first to answer will block the socket for this error.

Maybe this error should be removed from the stack. I'm not sure about it right now. (cc @tilgovi @sirkonst )

[1] https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/sync.py#L50

@thomaswhitcomb
Copy link
Author

Thanks for looking into this. As far as I can tell, the exception occurs on every request. Am I to ignore it? How will I determine when a real error exists?

@tilgovi
Copy link
Collaborator

tilgovi commented Apr 21, 2013

Seems like we could change the log level for this one.
On Apr 21, 2013 6:15 AM, "tom" notifications@github.com wrote:

Thanks for looking into this. As far as I can tell, the exception occurs
on every request. Am I to ignore it? How will I determine when a real error
exists?


Reply to this email directly or view it on GitHubhttps://github.com//issues/514#issuecomment-16722945
.

@benoitc
Copy link
Owner

benoitc commented Apr 22, 2013

@tilgovi this is a log that come from flask inspecting the stacktrace. we should either let it or clear the stacktrace and log it. Is this what you mean?

@thomaswhitcomb
Copy link
Author

I have only been running with a single worker. Is it strange that there is a race condition on the socket?

On Apr 22, 2013, at 1:51 AM, Benoit Chesneau notifications@github.com wrote:

@tilgovi this is a log that come from flask inspecting the stacktrace. we should either let it or clear the stacktrace and log it. Is this what you mean?


Reply to this email directly or view it on GitHub.

@benoitc
Copy link
Owner

benoitc commented Apr 22, 2013

How many connections ?

On Mon, Apr 22, 2013 at 4:19 PM, tom notifications@github.com wrote:

I have only been running with a single worker. Is it strange that there is
a race condition on the socket?

On Apr 22, 2013, at 1:51 AM, Benoit Chesneau notifications@github.com
wrote:

@tilgovi this is a log that come from flask inspecting the stacktrace.
we should either let it or clear the stacktrace and log it. Is this what
you mean?


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHubhttps://github.com//issues/514#issuecomment-16790228
.

@thomaswhitcomb
Copy link
Author

Connections being inbound TCP requests? 1 at a time. I am just doing single user testing.

On Apr 22, 2013, at 7:20 AM, Benoit Chesneau notifications@github.com wrote:

How many connections ?

On Mon, Apr 22, 2013 at 4:19 PM, tom notifications@github.com wrote:

I have only been running with a single worker. Is it strange that there is
a race condition on the socket?

On Apr 22, 2013, at 1:51 AM, Benoit Chesneau notifications@github.com
wrote:

@tilgovi this is a log that come from flask inspecting the stacktrace.
we should either let it or clear the stacktrace and log it. Is this what
you mean?


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHubhttps://github.com//issues/514#issuecomment-16790228
.


Reply to this email directly or view it on GitHub.

@thomaswhitcomb
Copy link
Author

I think I see. gunicorn always starts 1 extra worker? Even with a gunicorn -w 1 simp:app, I see two workers via a ps aux.

Oh maybe I got that wrong, one is the process doing the select and the other is actually honoring the request?
Yeah, I wrong. sorry about the spam

@benoitc
Copy link
Owner

benoitc commented Apr 22, 2013

@thomaswhitcomb np I made an optimisation there to make sure the worker won't be launched simultaneously. so they don't listen at the same time.

Anyway this the next major iteration will also improve the accept loop.

@remohammadi
Copy link

I was getting these errors 5 to 100 times per day, sometimes lots of these emails during an hour, and sometimes just 2 or 3 and then nothing for hours. I upgraded my gunicorn to the latest version which contains 3ade8e8, 18.0, but it didn't change the frequency of the emails.

Today I discovered that it's actually a failure, and the viewer sees 500. The frequency of the emails is NOT correlated with the load/traffic/io/cpu of the server.

I applied this solution about 10 hours ago and it seems that it has solved the problem. I haven't received any error yet. I can't understand why increasing the somaxconn solves the problem. What type of connection is lined in the queue? Specially when the load of server is far from its peak point.

Traceback (most recent call last):

  File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 115, in get_response
    response = callback(request, *callback_args, **callback_kwargs)

  File "/usr/local/lib/python2.7/dist-packages/django/db/transaction.py", line 223, in inner
    return func(*args, **kwargs)

  File "/usr/local/lib/python2.7/dist-packages/gunicorn/workers/sync.py", line 45, in run
    client, addr = sock.accept()

  File "/usr/lib/python2.7/socket.py", line 202, in accept
    sock, addr = self._sock.accept()

error: [Errno 11] Resource temporarily unavailable

@remohammadi
Copy link

Got 7 errors again, after 23 hours from the previous sequence of "Resource temporarily unavailable" errors.

@tilgovi
Copy link
Collaborator

tilgovi commented Jan 28, 2014

@remohammadi are you sure this results in 500 errors to the client? It may be a different issue even. I don't think SOMAXCONN is the issue here.

@remohammadi
Copy link

I don't remember what was my observation which made me conclude that the user is getting 500 :( I'm still getting the error emails, but the frequency is low and so I'm ignoring it.

@david-saracini
Copy link

I'm getting this say same with error with gunicorn + flask. It's intermittent but seems to be happening alot over the last 24 hours. Any updates here?

@pdodde
Copy link

pdodde commented Feb 25, 2014

This looks like a race condition. I think you're getting real socket exceptions, but your web app isn't reading the message from socket.error before gunicorn calls accept() again on the socket. That exception you see is thrown every time gunicorn calls accept() on a socket that doesn't have a client request waiting but it is ignored. Each python socket object has one error object.

One line pull request if somebody with problems wants to try it: #709

@mgraupner
Copy link

I don't know if it's related, but I'm also receiving the [Errno 11] Resource temporarily unavailable message when using Gunicorn with Flask and MySQL. After adding some debug statements to the @teardown method in Flask-SQLAlchemy i got the [Errno 11] Resource temporarily unavailable error in response_or_exc but only when using the autocommit feature, so changes without an explicit commit were never submitted to the database. (SQLALCHEMY_COMMIT_ON_TEARDOWN was activated)

@teardown
def shutdown_session(response_or_exc):
  print response_or_exc
  if app.config['SQLALCHEMY_COMMIT_ON_TEARDOWN']:
    if response_or_exc is None:
      self.session.commit()
  self.session.remove()
  return response_or_exc

After a lot of searching I switched from sync worker to gevent worker (found this solution here: http://serverfault.com/questions/427600/gunicorn-django-nginx-unix-socket-failed-11-resource-temporarily-unavail/606371#606371) and now everything works as expected. I also tried the somaxconn before switching workers but was still receiving the error message. I don't know what consequences switching workers has (still new to Python and Flask) so I would like to know how I could use the sync worker without getting this error.

@tilgovi
Copy link
Collaborator

tilgovi commented Feb 9, 2015

Maybe the sync worker is not always closing requests properly when there are error responses?

@pdodde
Copy link

pdodde commented Feb 9, 2015

If it helps, I found that our code was logging an exception when it should have been logging an error. If I remember correctly, there is one global variable in python that holds the last stack trace. Since gunicorn polls the socket over and over and catches this Errno 11 exception every time, this is the stack trace you will see if you print a stack trace when an exception hasn't actually been raised.

@mgraupner
Copy link

@tilgovi How would I debug this? There seem to be no errors in the response in my development environment where I don't use gunicorn and neither are there any errors using the gevent worker.

@ohadperry
Copy link

I tried the gevent solution and it's not closing old workers. did anyone else encounter this behaviour?

@benoitc
Copy link
Owner

benoitc commented Apr 27, 2015

@ohadpartuck what is the gevent solution? Anything reproducible?

Also this ticket has been closed if you have an issue please open a new one eventually linked to that one. It's hard to track the other way.

@ohadperry
Copy link

worker_class = 'sync' vs worker_class = 'gevent'
it's very much related to this ticket due to the Resource temporarily unavailable which I am experiencing

@benoitc
Copy link
Owner

benoitc commented Apr 27, 2015

still. what is the gevent solution? How can we reproduce the issue?

This ticket has been closed anyway and new discussion should happen on a new ticket. the issue may or may not be different...

@rafaelosoto
Copy link

Hey guys, just wondering if this was still being addressed or not? I've been having a similar issue. Flask seems aware but reluctant to do anything: pallets/flask#984

@pawl
Copy link

pawl commented Jun 13, 2017

I was able to reproduce this issue here (using the ab command in the readme): https://github.com/pawl/somaxconn_test

Increasing the net.core.somaxconn on only the container ended up fixing it.

@ilanbiala
Copy link

Is there a new issue where this is being discussed?

@eedwards-sk
Copy link

eedwards-sk commented Apr 18, 2018

@benoitc

I'm seeing this same issue when running with the datadog ddtrace-py library (with gunicorn 19.7.1):

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/gunicorn/workers/sync.py", line 68, in run_for_one
    self.accept(listener)
  File "/usr/local/lib/python2.7/dist-packages/gunicorn/workers/sync.py", line 27, in accept
    client, addr = listener.accept()
  File "/usr/lib/python2.7/socket.py", line 206, in accept
    sock, addr = self._sock.accept()
error: [Errno 11] Resource temporarily unavailable

The request is successful (200) but it's still generating a stack trace.

run command looks like this:

ddtrace-run gunicorn core.app:create_app() -b 0.0.0.0:5000 -w 3

pgjones added a commit to pgjones/hypercorn that referenced this issue Jan 1, 2019
There is a strange bug on windows whereby this stack trace is seen,

  File "c:\users\administrator\qtest\hypercorn\hypercorn\asyncio\run.py", line 140, in run_single
    server = loop.run_until_complete(create_server)
  File "C:\Program Files\Python37\lib\asyncio\base_events.py", line 574, in run_until_complete
    return future.result()
  File "C:\Program Files\Python37\lib\asyncio\base_events.py", line 1387, in create_server
    server._start_serving()
  File "C:\Program Files\Python37\lib\asyncio\base_events.py", line 277, in _start_serving
    sock.listen(self._backlog)
  OSError: [WinError 10022] An invalid argument was supplied

I think this is a race condition when two processes listen on the same
socket at the same time. However I can only intermittently reproduce
this bug.

This fix follows Gunicorn's practice (although Gunicorn applies this
to all platforms), as discussed in
benoitc/gunicorn#514 and implemented in
benoitc/gunicorn@3ade8e8.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests