Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pseudo folders break directory detection code in fs.info #312

Closed
dynamix opened this issue Nov 27, 2020 · 8 comments · May be fixed by #313
Closed

Pseudo folders break directory detection code in fs.info #312

dynamix opened this issue Nov 27, 2020 · 8 comments · May be fixed by #313

Comments

@dynamix
Copy link
Contributor

dynamix commented Nov 27, 2020

Various tools/libraries create pseudo folders in GS, for example Tensorflow or using the "Create folder" function in the Google Storage Browser UI.

The code that identifies directories does not handle this properly: https://github.com/remerge/gcsfs/blob/be50a9a8d23a04529c40540a1cc123ce5b3512fc/gcsfs/core.py#L819-L820

The folder name will be an exact match and assumed to be a file on the first invocation. This breaks various other methods like isdir.

How to reproduce:

import gcsfs
import tensorflow as tf

path = "YOUR-BUCKET/foo"

tf.io.gfile.mkdir(f"gs://{path}")

fs = gcsfs.GCSFileSystem()
print(fs.isdir(path)) # => False
print(fs.isdir(path)) # => True

I'll create a PR tomorrow to fix this.

@martindurant
Copy link
Member

I would argue that path is indeed a file because ... it is a file, albeit empty. Directories are only ever implied by their contents, since their names are returned as "common prefixes", i.e., a different mechanism (and the names will not contain "/" at the end).
What were to happen if of the folder placeholders were to contain data?

@dynamix
Copy link
Contributor Author

dynamix commented Nov 30, 2020

Yes that is what I meant by pseudo folder. The issue here is that what info returns is not consistent due to trying to support the idea of folders. In the example isdir should be idempotent but it does returns false on the first invocation and true on the second.

The pseudo folders are usually indicated by a zero size objects ending with /. Detecting this would be my suggested fix: #313

@martindurant
Copy link
Member

In the example isdir should be idempotent but it does returns false on the first invocation and true on the second.

Totally agree that this is wrong, and there should be a test that ensures that repeated calls get the same result.

@isidentical
Copy link
Member

I also agree consistency is the important part.

@bilelomrani1
Copy link

Just wanted to check if there is any news on this issue?

@martindurant
Copy link
Member

This is likely fixed in the current release. Are you still experiencing a failing case, @bilelomrani1 ?

@bilelomrani1
Copy link

bilelomrani1 commented Jul 15, 2023

I apologize for the confusion caused by my previous comment. After further investigation, I realized that the issue I encountered was not related to what I initially thought. Instead, I discovered that the problem stemmed from the fact that fs.makedirs behaves as a no-op with gcsfs, which is a documented behavior.

@martindurant
Copy link
Member

no worries :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants