Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is a way to access an underlying mapping in MappingProxyType #88004

Closed
serhiy-storchaka opened this issue Apr 14, 2021 · 17 comments
Closed
Labels
3.8 only security fixes 3.9 only security fixes 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@serhiy-storchaka
Copy link
Member

BPO 43838
Nosy @gvanrossum, @rhettinger, @ncoghlan, @serhiy-storchaka, @brandtbucher, @domdfcoding
PRs
  • bpo-43838: Use a copy to delegate types.MappingProxyType operators #27300
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-08-07.07:55:34.904>
    created_at = <Date 2021-04-14.07:34:28.852>
    labels = ['interpreter-core', 'type-bug', '3.8', '3.9', '3.10']
    title = 'There is a way to access an underlying mapping in MappingProxyType'
    updated_at = <Date 2021-08-07.08:52:38.535>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2021-08-07.08:52:38.535>
    actor = 'ncoghlan'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-08-07.07:55:34.904>
    closer = 'gvanrossum'
    components = ['Interpreter Core']
    creation = <Date 2021-04-14.07:34:28.852>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 43838
    keywords = ['patch']
    message_count = 13.0
    messages = ['391039', '397244', '397245', '398023', '398025', '398033', '398045', '398072', '398700', '399022', '399057', '399059', '399171']
    nosy_count = 6.0
    nosy_names = ['gvanrossum', 'rhettinger', 'ncoghlan', 'serhiy.storchaka', 'brandtbucher', 'domdfcoding']
    pr_nums = ['27300']
    priority = 'normal'
    resolution = 'rejected'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue43838'
    versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

    @serhiy-storchaka
    Copy link
    Member Author

    The purpose of MappingProxyType is to provide a read-only proxy for mapping. It should not expose the underlying mapping because it would invalidate the purpose of read-only proxy. But there is a way to do this using comparison operator:

    from types import MappingProxyType
    
    orig = {1: 2}
    proxy = MappingProxyType(orig)
    
    class X:
        def __eq__(self, other):
            other[1] = 3

    assert proxy[1] == 2
    proxy == X()
    assert proxy[1] == 3
    assert orig[1] == 3

    In particularly it allows to modify __dict__ of builtin types.

    @serhiy-storchaka serhiy-storchaka added 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Apr 14, 2021
    @serhiy-storchaka
    Copy link
    Member Author

    Example of modifying a builtin type:

    >>> class Sneaky:
    ...     def __eq__(self, other):
    ...         other['real'] = 42
    ... 
    >>> int.__dict__ == Sneaky()
    >>> (1).real
    42

    But it can also lead to crash (due to outdated type cache):

    >>> class Sneaky:
    ...     def __eq__(self, other):
    ...         other['bit_length'] = 42
    ... 
    >>> int.__dict__ == Sneaky()
    >>> (1).bit_length
    Segmentation fault (core dumped)

    @ncoghlan
    Copy link
    Contributor

    I stumbled across this independently in bpo-44596, but missed this existing report due to the specific word I was looking for (encapsulation).

    In addition to the comparison operand coercion, there's now another access option through the __ror__ method.

    The bug in both cases arises from delegating a method/function implementation to the underlying mapping type in a way that invokes the full operand coercion dance. (PyObject_RichCompare in one case, PyNumber_Or in the other)

    Delegating these operations to the underlying mapping does make sense, but it needs to be a lower level delegation that bypasses the operand coercion dance, so the other operand only ever sees the proxy object, not the underlying mapping.

    @brandtbucher
    Copy link
    Member

    I believe that delegating to the actual underlying mapping without exposing it is a bit of a lost cause, since for some type m we still need these to work:

    >>> types.MappingProxyType(m({"a": 0)) | types.MappingProxyType(m({"b": 1}))
    m({'a': 0, 'b': 1}) 
    >>> types.MappingProxyType(m({"a": 0)) == types.MappingProxyType(m({"a": 0}))
    True

    (Note that because both sides are proxies, it's impossible for any resolution to happen without m explicitly knowing how to handle them unless both mappings are unwrapped simultaneously.)

    Instead, the attached PR delegates to a *copy* of the underlying mapping for these operations instead. I think this is the easiest way to approximate the current behavior while maintaining proper encapsulation.

    @gvanrossum
    Copy link
    Member

    Maybe we should not fix this then? Copying seems unattractive, and the
    reason we proxy in the first place is not absolute safety but to make sure
    people don’t accidentally update the dict when they intend to update the
    class (for the side effect of updating slots when e.g. __add__ is
    added/removed.--
    --Guido (mobile)

    @serhiy-storchaka
    Copy link
    Member Author

    For __or__ we need to copy the content of both mapping to the resulting mapping in any case. We can implement it as {**self, **other}. We should not use the copy() method because it is not a part of the Mapping interface.

    For __eq__, no copying is needed if we just re-implement Mapping.__eq__ (with special cases for few known types for performance).

    @ncoghlan
    Copy link
    Contributor

    Delegating operations to the underlying mapping is still OK, all we're wanting to bypass is the operand coercion dances in abstract.c and object.c that may expose the underlying mapping to the *other* operand.

    We don't want to implicitly copy though, as the performance hit can be non-trivial for large mappings, even if the algorithmic complexity doesn't change.

    For the nb_or slot, it should be correct to retrieve and call __or__ from the underlying mapping when the left operand is a MappingProxy instance, and __ror__ when the right operand is a MappingProxy instance (but the left operand is not).

    Rich comparison is messier though, since PyObject_RichCompare handles both the operand coercion dance *and* the mapping of the specific comparison operations to their implementation slots.

    I don't see a clean solution to that other than defining a new PyObject_DelegateRichCompare helper function for the mapping proxy to use that provides the mapping from specific operands to their implementation slots *without* doing the coercion dance.

    However we resolve it, I think it probably won't be worth the risk of backporting the change (this has been the way it currently is for a long time, and it's a "reduce the risk of error" feature rather than any kind of security boundary).

    @brandtbucher
    Copy link
    Member

    Delegating operations to the underlying mapping is still OK, all we're wanting to bypass is the operand coercion dances in abstract.c and object.c that may expose the underlying mapping to the *other* operand.

    But this won't work if *both* operands are proxies, right? The left wrapped mapping will only ever see itself and the right mappingproxy, and the right operand will only ever see itself and the left mappingproxy. The overwhelmingly common case is where these proxies wrap dicts, and dicts only play nicely with other dicts.

    I agree that creating redundant copies isn't optimal; it's just the only way that I see we'd be able to fix this without backward compatibility headaches. The fix is easy to explain to users without needing to get into the nuances of operator dispatch or bloating the code to handle weird edge-cases (trust me, I originally tried coding logic to handle all the different possibilities of subclasses and proxies and such before landing on the copying solution).

    With that said, I think that if we decide it's really not worth it to copy here, Serhiy's proposal is probably "good enough". Just return a merged dict for the union operation, and implement mapping equality in a sane, usable way. I just worry that it will break backward-compatibility in subtle ways that are basically impossible to fix (comparing equality of proxied OrderedDicts, for example).

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Aug 1, 2021

    Ouch, you're right - I forgot that dict just returned NotImplemented from comparisons and unions and made it the other operand's problem:

    >>> d1 = dict(x=1)
    >>> d2 = dict(x=1)
    >>> from types import MappingProxyType as proxy
    >>> d1.__eq__(proxy(d2))
    NotImplemented
    >>> d1.__or__(proxy(d2))
    NotImplemented

    Perhaps we could mitigate the performance hit by skipping the copy when the other side is either a builtin dict (subclasses don't count), or a proxy around a builtin dict?

    @rhettinger
    Copy link
    Contributor

    [Serhiy]

    In particularly it allows to modify __dict__ of builtin types.

    This doesn't look like something that would happen accidentally. We've never had any bug reports about this occurring in the wild.

    [GvR]

    Maybe we should not fix this then?

    That seems reasonable. In general, it is hard to completely wall-off instances against deliberate efforts to pry them open:

        >>> from types import MappingProxyType
        >>> import gc
        >>> orig = {1: 2}
        >>> proxy = MappingProxyType(orig)
        >>> refs = gc.get_referents(proxy)
        >>> refs
        [{1: 2}]
        >>> refs[0][1] = 3
        >>> orig
        {1: 3}

    @brandtbucher
    Copy link
    Member

    I’m okay with closing as “won’t fix”.

    @serhiy-storchaka
    Copy link
    Member Author

    Good catch! Thank you Raymond for the third example.

    I though that it can be easily fixed by calling tp_traverse() for submapping:

     static int
     mappingproxy_traverse(PyObject *self, visitproc visit, void *arg)
     {
         mappingproxyobject *pp = (mappingproxyobject *)self;
    -    Py_VISIT(pp->mapping);
    -    return 0;
    +    return Py_TYPE(pp->mapping)->tp_traverse(pp->mapping, visit, arg);
     }

    but it does not work.

    So I am okay with closing this issue.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Aug 7, 2021

    That resolution makes sense to me as well.

    Should we make a note in the documentation for https://docs.python.org/3/library/types.html#types.MappingProxyType that the "read-only" protection is to guard against *accidental* modification, not against active attempts to subvert the protection?

    @Tcll
    Copy link

    Tcll commented Aug 2, 2024

    thanks for not fixing this so I can completely change the functionality of python with this exploit and fix a ton of the broken syntax and false functionality :)

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Aug 9, 2024

    There's a reason we didn't consider this a meaningful security concern:

    $ cat hack_python.py
    #!/usr/bin/python3
    import sys
    
    def _hacked(*args, **kwds):
      sys.stdout.write("If they can run Python code, you've already lost\n")
    
    import builtins
    builtins.print = _hacked
    
    print("This is fine")
    $ ./hack_python.py
    If they can run Python code, you've already lost
    

    But yes, it does allow for reopening otherwise immutable classes and changing how their methods work.

    @Tcll
    Copy link

    Tcll commented Aug 13, 2024

    Yep I've caught a few other things as well, such as property not being read-only:

    >>> class A(object):
    ...     a = property(lambda i: 'you found Guido')
    ... 
    >>> a = A()
    >>> a.a
    'you found Guido'
    >>> A.a.fget = lambda i: 'fakeout'           
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: readonly attribute
    >>> A.a.__init__(lambda i: 'you lost Guido') # no it is certainly NOT readonly
    >>> a.a
    'you lost Guido'
    >>> 
    

    *should've used __new__ instead of __init__ so the result is a new property with the new getter rather than the same property with the new getter- 😏

    as well as not private attributes:

    >>> class A(object):    
    ...     __slots__ = ('__private')
    ...     def __new__(cls):
    ...         a = object.__new__(cls)
    ...         a.__private = 'this attribute is private'
    ...         return a
    ...     showprivate = lambda a: a.__private
    ... 
    >>> a = A()
    >>> a.showprivate()
    'this attribute is private'
    >>> a._A__private = 'no this attribute is NOT private'
    >>> a.showprivate()
    'no this attribute is NOT private'
    >>> 
    

    should've used a hidden member_descriptor instance removed from the class dict storing the private attribute value instead of just renaming the dict entry 😏

    All of these are just as bad of an exploit as each other

    The only thing that was even sort of decent for being read-only was the hidden functionality of member_descriptor applied to slice() attributes (could've also been used for property() attributes)

    but I guess now even that's blown out of the water with the mappingproxy exploit :P

    @ncoghlan
    Copy link
    Contributor

    There are many reasons why in process access restrictions in CPython only prevent inadvertent modifications, they don't prevent malicious ones. See https://lwn.net/Articles/574215/ for more of the problems with internal sandboxes (rather than external ones along the lines of Linux containers).

    This closed issue isn't the right place to discuss them though, so I'm going to go ahead and lock this.

    @python python locked as resolved and limited conversation to collaborators Aug 14, 2024
    Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
    Labels
    3.8 only security fixes 3.9 only security fixes 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants