Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fallback to temp dir if cache does not exist #1784

Merged
merged 4 commits into from
Apr 20, 2022

Conversation

maybe-sybr
Copy link
Contributor

This fixes a misbehaviour in environments where XDG_CACHE_DIR is not
set and ~/.cache does not exist, e.g. in the docker image. We simply
attempt to create a TemporaryDirectory which will be held until the
pwn.context is destroyed and the reference is dropped, at which point
it should be cleaned up.

By avoiding situations where cache_dir can be None, things that
expect it to be a valid path string (e.g. libcdb.search_by_hash() and
the ROP cache helpers) won't blow up unless there is a surprising
failure to access the cache dir we intend to use for writing.

@maybe-sybr
Copy link
Contributor Author

This PR is targeting stable but it's a minor bug and could easily land in beta or dev depending on your preferences.

@Arusekk
Copy link
Member

Arusekk commented Feb 6, 2021

I think this is a breaking change, as the doc cases pretty explicitly state that the directory can change depending on its permissions and similar stuff. So it will be necessary to either remove the cache caching (yeah, funny) logic and just let it go every time, or to retarget against dev. What is bad in using ~/.cache? Shouldn't it just os.makedirs() the path if it is not found in place?

@heapcrash
Copy link
Collaborator

heapcrash commented Feb 6, 2021

I agree we should create the directory if it does not exist. We should probably use os.makedirs instead of os.mkdir.

Also, it's important that it's still possible to set context.cache_dir = None to disable caching. Does this PR support this use-case?

@@ -1232,13 +1233,22 @@ def cache_dir(self):
>>> cache_dir == context.cache_dir
True
"""
try:
return self._cache_dir
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't check self._cache_dir but rather the defaults dictionary if it's missing.

pwnlib/context/__init__.py Outdated Show resolved Hide resolved
@@ -1251,7 +1261,8 @@ def cache_dir(self):
if not os.access(cache, os.W_OK):
return None

return cache
self._cache_dir = cache
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inconsistent with how everything inside of context works, we should not be setting any attributes that are not TLS-backed, since each thread may modify the context and they should not interfere with each other.

This is why there's a lot of work in the context module to work off of TLS. Set the value you want in pwnlib.context.context.defaults if you really need a non-standard default .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now taken the approach of setting the base path as an entry in defaults and generating the cache_dir as a property based on that value. cache_base_dir is a new validated property and changing its value clears the cache_dir stashed in _tls. The cache_dir is not a validated property since I didn't want to give it a setter without specific input from you in that direction. It just has a getter which reaches into _tls for a well known key ("cache_dir", which should probably be a constant somewhere to avoid drift).

Let me know what you think of the new approach.

@maybe-sybr
Copy link
Contributor Author

I agree we should create the directory if it does not exist. We should probably use os.makedirs instead of os.mkdir.

Also, it's important that it's still possible to set context.cache_dir = None to disable caching. Does this PR support this use-case?

I think if you're happy with having it use os.makedirs() instead then this entire PR could be scrapped. The main goal was simply to pick an unobtrusive temporary place to cache things for the lifetime of the process when the we can't find or create the default path. LMK what you'd like to do.

@maybe-sybr
Copy link
Contributor Author

@Arusekk - thoughts on my last comment? I"m happy to close this in favour of an alternative PR which simply does os.makedirs(cache, exist_ok=True) and can submit that if you'd like.

@Arusekk
Copy link
Member

Arusekk commented Feb 20, 2021 via email

@maybe-sybr
Copy link
Contributor Author

Closing in favour of #1812

@maybe-sybr
Copy link
Contributor Author

Closing in favour of #1812

Lucky I clicked the wrong button here. #1812 uses a Python 3ism and per the feedback left on there, I think the approach here is more worthwhile to pursue. I've reworked the diff so it should be Python 2 compatible (mkdtemp and atexit rather than TemporaryDirectory :( ). I still need to look into any required testing since this is currently just eyeballed.

@maybe-sybr
Copy link
Contributor Author

I agree we should create the directory if it does not exist. We should probably use os.makedirs instead of os.mkdir.

Also, it's important that it's still possible to set context.cache_dir = None to disable caching. Does this PR support this use-case?

@heapcrash - was this ever actually possible? Per my other comment, cache_dir is a property with no setter and I've left it as such so AFAICT it can't currently be set to None. We could turn it into a @_validator in this diff if you'd like for that to be possible?

@heapcrash
Copy link
Collaborator

heapcrash commented Feb 24, 2021

@maybe-sybr First, thanks for the effort you're putting into this. If you want to discuss more informally, please join our Discord #development channel here: https://discord.gg/xbFVcTzV

I agree we should create the directory if it does not exist.

Agreed.

We should probably use os.makedirs instead of os.mkdir.

Agreed. Note that exist_ok does not work on Python2, so we will have to either catch the exception or do os.path.isdir first.

Also, it's important that it's still possible to set context.cache_dir = None to disable caching. Does this PR support this use-case?

@heapcrash - was this ever actually possible? Per my other comment, cache_dir is a property with no setter and I've left it as such so AFAICT it can't currently be set to None. We could turn it into a @_validator in this diff if you'd like for that to be possible?

I think you're correct that this was never an option to set to None, but setting to None should force the usage of a temporary directory.

I think the suggested implementation at this point are:

  • The property context.cache_dir should always be a six.text_type of the temporary directory path (versus being a TemporaryDirectory object)
    • This allows unicode characters to appear in cache_dir, which is important as the user's home directory may contain non-ASCII characters
  • On startup, attempt to create the default cache_dir which is e.g. ~/.cache/.pwntools-cache-3.8 with os.makedirs
    • Check if the directory already exists with os.path.isdir since we can't use exist_ok
    • If unable to create that directory, display a warning and use a temporary directory with the prefix 'pwntools-X.Y'
    • Whatever value we decide on startup should be set in context.defaults, not against context.cache_dir
      • For this reason I recommend below to refactor most of this code to a new pwnlib.cache module
  • It should be possible to set cache_dir to any existing directory
    • If the new directory does not exist, a ValueError exception should be raised. I don't think we can use log.error from the context module due to cyclic imports.
      • e.g. context.cache_dir = '/does/not/exist'
  • It should be possible to set context.cache_dir=None which sets it to a new temporary directory (with the 'pwntools-X.Y' prefix)
    • All temporary directories created via setting cache_dir to None should have an associated pwnlib.atexit.register() handler which invokes shutil.rmtree on the temporary directory
  • All temporary directories should be generated with tempfile.TemporaryDirectory and setting prefix='pwntools-X.Y' where X.Y is the current Python interpreter version
  • Upon any changes to

Something that is nice to have is to refactor some of the logic to keep pwnlib.context code as simple as possible is to create a new context.cache module.

  • Create the new module pwnlib.cache to some of this logic
    • Move the code for calculating the default cache path here
    • Move the code for generating a temporary directory here which is just a wrapper around TemporaryDirectory() which automatically sets prefix='pwntools-'
    • Add a function to create an further nested temporary directory or file
      • pwnlib.cache.file(*args, **kwargs) will create a named file inside the current context.cache_dir and return the name (just a wrapper around open(file, *args, **kwargs))
      • pwnlib.cache.makedirs(*args, **kwargs) will create a directory tree (or just a single directory) inside the current context.cache_dir (just a wrapper around os.makedirs(*args, **kwargs))
        • These two bits will save lots of os.path.join(context.cache_dir, sometime) calls
    • The context.cache_dir setter implementation should call into the functions in the new pwnlib.cache module

And finally, having tests for each of these is important, but we can cross that bridge when the implementation stabilizes.

@maybe-sybr
Copy link
Contributor Author

Just an update, this is on my list but I've been a bit busy the past couple of weeks. I've noted the ideas above and I think I'll get everything up to but not including the separate cache management module. That seems achievable for me in the short term and then we can build from that to a separate module moving forward if that's useful to do.

pwnlib/context/__init__.py Outdated Show resolved Hide resolved
pwnlib/context/__init__.py Outdated Show resolved Hide resolved
pwnlib/context/__init__.py Outdated Show resolved Hide resolved
@Arusekk Arusekk enabled auto-merge (squash) April 20, 2022 10:52
This fixes a misbehaviour in environments where `XDG_CACHE_DIR` is not
set and `~/.cache` does not exist, e.g. in the docker image. We simply
attempt to create a temporary directory which will be registered with
`atexit` to be deleted on exit using `shutil.rmtree()`. A Python 3
approach for future would be to use a `TemporaryDirectory` instead which
would get cleaned up when the held object gets dropped out of the TLS
stack.

It is still possible for the `context.cache_dir` to be `None` if the
default directory exists but is not writable, which means some code
which uses this property is liable to blow up in such a circumstance.
@Arusekk Arusekk force-pushed the fix/cache-tempdir-fallback branch 2 times, most recently from b4acbd5 to 09f6756 Compare April 20, 2022 11:34
@Arusekk Arusekk merged commit c93bdfe into Gallopsled:stable Apr 20, 2022
gogo2464 pushed a commit to gogo2464/pwntools that referenced this pull request Sep 10, 2023
This fixes a misbehaviour in environments where `XDG_CACHE_DIR` is not
set and `~/.cache` does not exist, e.g. in the docker image. We simply
attempt to create a temporary directory which will be registered with
`atexit` to be deleted on exit using `shutil.rmtree()`. A Python 3
approach for future would be to use a `TemporaryDirectory` instead which
would get cleaned up when the held object gets dropped out of the TLS
stack.

It is still possible for the `context.cache_dir` to be `None` if the
default directory exists but is not writable, which means some code
which uses this property is liable to blow up in such a circumstance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants