Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

user_id always None for DRF requests when using RequestMiddleware #37

Closed
immortaleeb opened this issue May 31, 2020 · 17 comments · Fixed by #38
Closed

user_id always None for DRF requests when using RequestMiddleware #37

immortaleeb opened this issue May 31, 2020 · 17 comments · Fixed by #38

Comments

@immortaleeb
Copy link

immortaleeb commented May 31, 2020

In my Django application I'm using the django_structlog.middlewares.RequestMiddleware to log all of the requests to my application.
I noticed that user_id is always None for all the calls made to my Django Rest Framework (DRF) urls.

Now, I know this is because RequestMiddleware tries to access request.user, which is only filled in when you're going through the Django authentication mechanism, which DRF bypasses completely. Instead, it performs authentication in the view layer, so after all middleware code has run.

I guess other people have encountered this too so I was wondering if there is any workaround so that I can track which user performed which request from "the start" until "the end" of the request?

I tried several things, but all of them feel like I'm fighting with DRF to get it to do something it doesn't support.

@jrobichaud
Copy link
Owner

Can this be because of the order of the middlewares in settings.py?

@jrobichaud
Copy link
Owner

In the demo of django-structlog it is last, after the authentication middleware.

@immortaleeb
Copy link
Author

I'm afraid it's not that simple, the middleware is actually already one of the last ones (only followed by the celery middleware):

MIDDLEWARE = [
    'django.middleware.security.SecurityMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'corsheaders.middleware.CorsMiddleware',
    'django.middleware.common.CommonMiddleware',
    'django.middleware.csrf.CsrfViewMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.contrib.messages.middleware.MessageMiddleware',
    'django.middleware.clickjacking.XFrameOptionsMiddleware',
    'django_structlog.middlewares.RequestMiddleware',
    'django_structlog.middlewares.CeleryMiddleware',
]

Maybe something else that I didn't make clear is that I'm using DRF's rest_framework.authentication.TokenAuthentication to authenticate with my REST apis.
As mentioned in my original post, this means that authentication is performed after all of the middlewares.

@immortaleeb
Copy link
Author

I found people encountering the same DRF issue but in a different context here: jpadilla/django-rest-framework-jwt#45 (comment)

As far as I can find request.user is only going to point to an actual user after it has been processed by DRF. So it's only available after processing the response (not at request time). Also see this PR: encode/django-rest-framework#2155

@jrobichaud
Copy link
Owner

jrobichaud commented May 31, 2020

DRF token authentication is not tested in django-structlog and now that you mention this issue I just remembered DRF handle the authentication in its view not in middleware.

It does work with DRF and django session authentication but it is not how drf is used normally. 🤦

I will investigate this and try to figure out a workaround.

But a quick workaround would be to implement a mixin or an intermediary view wrapping DRF authentication then bind/log the user. It is not ideal but it would work. 🤷

@immortaleeb
Copy link
Author

Ok, thanks for clarifying.

The problem with an intermediary view is that the code would be executed after request_started has been logged. Although user_id would be filled in at request_finished, it would still be None at request_started, but I guess I can live with that.

Alternatively, I could implement a middleware that executes after django.contrib.auth.middleware.AuthenticationMiddleware which calls DRF's authentication explicitly if request.user is not set, similar to what some people did in jpadilla/django-rest-framework-jwt#45 (comment) . The downside of that would be going to the database twice for authentication (although you could probably fix that with caching).

@jrobichaud
Copy link
Owner

There are no optimal solution until DRF fix the issue you mentioned. You have to choose your poison until then.

I will update the documentation to reflect this issue.

I would appreciate any pull requests or just code snippets to help working around this.

@immortaleeb
Copy link
Author

I wouldn't really know how to integrate this properly with the existing code base, but I can share the solution I've implemented.

I went for the mixin approach, wrapping each of DRF's APIView and subclasses. I added this to my app's views.py file:

import structlog

from structlog.threadlocal import bind_threadlocal

from rest_framework.views import APIView as DRFAPIView
from rest_framework.viewsets import ModelViewSet as DRFModelViewSet

class StructlogMixin:
    def initial(self, request, *args, **kwargs):
        super().initial(request, *args, **kwargs)

        if hasattr(request, 'user'):
            user_id = request.user.pk

            # Force user_id to be visible in response log
            structlog.get_logger('django_structlog.middlewares.request').bind(user_id=user_id)
            bind_threadlocal(user_id=user_id)

class APIView(StructlogMixin, DRFAPIView):
    pass

class ModelViewSet(StructlogMixin, DRFModelViewSet):
    pass

Then, instead of using DRF's APIView and ModelViewSet I instead use the overridden ones. You can easily do this by replacing imports:

from rest_framework.views import APIView

becomes

from myapp.views import APIView

Then you also need to make sure that the threadlocal context is cleared after every request because threads might be reused for different HTTP request. I did this by adding a middleware:

from structlog.threadlocal import clear_threadlocal

class StructlogThreadLocalMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        response = self.get_response(request)

         # Clear threadlocal after every response
        clear_threadlocal()

        return response

I add this right before the first middleware that logs information (in my case this is django-structlog's request middleware):

MIDDLEWARE = [
    # ...
    'myapp.middlewares.StructlogThreadLocalMiddleware',
    'django_structlog.middlewares.RequestMiddleware',
]

Finally, I also added merge_threadlocal as the first processor in my structlog config because we're using structlog's threadlocal functionality:

structlog.configure(
    processors=[
        structlog.threadlocal.merge_threadlocal,
        # ...
    ]
    # ...

Note that this still has the downside of logging user_id=None for each request_started, but all the logs after that should include the correct user_id and you can still track the user_id by searching for the request_finished event with the same request_id.

@jrobichaud
Copy link
Owner

In django-structlog we could just log the user_id in the request finish as well (or with a setting). You would not have do anything fancy. What do you think?

@immortaleeb
Copy link
Author

Hmm, I don't know. That would mean that user_id is always None except for the request_finished log line.
With my current (although hacky) solution, user_id is only None for request_started. All the log lines in between actually have a user_id.

On the other hand you could argue that this isn't django-structlog's responsibility because DRF doesn't support hooking into the view layer to support cases like these (afaik). In that case, binding a user_id a second time is probably the only thing you can do.

@prik2693
Copy link

prik2693 commented Jun 13, 2020

@jrobichaud
In django-structlog we could just log the user_id in the request finish as well (or with a setting). You would not have do anything fancy. What do you think?

How do we log any variable (eg: user_id) in the request_finished log line only and not to the log lines between request_started and request_finished ?

@jrobichaud
Copy link
Owner

@jrobichaud
In django-structlog we could just log the user_id in the request finish as well (or with a setting). You would not have do anything fancy. What do you think?

How do we log any variable (eg: user_id) in the request_finished log line only and not to the log lines between request_started and request_finished ?

It is not supported at the moment, you can create an issue for this.

@jrobichaud
Copy link
Owner

I made the workaround we discussed, a setting to add the user_id in the request finished.

@jrobichaud
Copy link
Owner

In 1.5.3 I added a setting to add user_id in request_finished and request_failed.
In 1.5.4 I just made sure user_id is in request_finished and request_failed all the time instead.

@v1k45
Copy link

v1k45 commented Oct 30, 2020

Have you considered evaluating DRF authentication classes in the middleware and then setting the user based on that?

That way all log events will contain the user_id. I have implemented this using signal receivers to make sure user_id and some other attributes are present in all events.

Here is a simplified version:

@receiver([bind_extra_request_metadata, bind_extra_request_finished_metadata, bind_extra_request_failed_metadata])
def bind_user(request, logger, **kwargs):
    """
    Bind Django/REST Framework user to the logger.
    """
    # Retrieve DRF user from auth headers if it is not already cached.
    token_user: Optional[TokenUser] = getattr(request, "_drf_token_user", None)
    if not hasattr(request, "_drf_token_user"):
        try:
            auth = JWTTokenUserAuthentication().authenticate(request)
            if auth:
                token_user = auth[0]
        except (TokenError, AuthenticationFailed):
            pass

    # Cache token user to prevent unnecessary JWT decode
    request._drf_token_user = token_user

    # Bind DRF token user if available.
    if token_user is not None:
        logger.bind(user_id=token_user.pk)
    # Fallback to django auth
    elif request.user:
        logger.bind(user_id=request.user.pk)

To make it more generic we can read from REST_FRAMEWORK['DEFAULT_AUTHENTICATION_CLASSES'].

A downside of this is that it is somewhat opinionated and adds overhead of processing authentication twice.

@jrobichaud
Copy link
Owner

It is an interesting solution, however views may redefine the authentication class.
We would need to get view associated with the url, then get its authentication classes.

@jrobichaud
Copy link
Owner

jrobichaud commented Oct 31, 2020

@v1k45 could you try to the class from the view?

I would add this workaround in the documentation or in a helper module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants