Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add VSIMemGenerateHiddenFilename() and use it in drivers #10771

Merged
merged 51 commits into from
Sep 16, 2024

Conversation

rouault
Copy link
Member

@rouault rouault commented Sep 11, 2024

Tons of drivers need to create temporary /vsimem/ files. The current practice has been that drivers needed to figure out the filename by themselves. There were 2 major issues:

  • the need to make it unique (among drivers, and within a driver in the event of multiple datasets opened at the same time, multi-threading), which required trickery using pointer addresses or atomic counters.
  • and the fact that such files could be visible of user code that would happen to list the content of "/vsimem/" with VSIReadDir() / VSIReadDirRecursive(), which could cause unwanted effects (imagine a webservice that would output files in "/vsimem/", and need list and zip them, which one GDAL driver potentially creating files in another thread). Of course, users could also namespace their use of /vsimem/, but this is a unnecessary complication.

This PR thus adds the following function and use it wherever possible:

/**
 * \brief Generates a unique filename that can be used with the /vsimem/
 * virtual file system.
 *
 * This function returns a (short-lived) string containing a unique filename,
 * (using an atomic counter), designed for temporary files that must remain
 * invisible for other users working at the "/vsimem/" top-level, i.e.
 * such files are not returned by VSIReadDir("/vsimem/") or
 * VSIReadDirRecursive("/vsimem/)".
 *
 * The function does not create the file per se. Such filename can be used to
 * create a regular file with VSIFOpenL() or VSIFileFromMemBuffer(), or create
 * a directory with VSIMkdir()
 *
 * Once created, deletion of those files using VSIUnlink(), VSIRmdirRecursive(),
 * etc. is of the responsibility of the user. The user should not attempt to
 * work with the "parent" directory returned by CPLGetPath() / CPLGetDirname()
 * on the returned filename, and work only with files at the same level or
 * in subdirectories of what is returned by this function.
 *
 * @param pszFilename the filename to be appended at the end of the returned
 *                    filename. If not specified, defaults to "unnamed".
 *
 * @return pointer to a short-lived string (rotating buffer of strings in
 * thread-local storage). It is recommended to use CPLStrdup() or std::string()
 * immediately on it.
 *
 * @since GDAL 3.10
 */
const char *VSIMemGenerateUnlistedFilename(const char *pszFilename);

@rouault rouault force-pushed the invisible_vsimem branch 2 times, most recently from cf21f7e to 1104e38 Compare September 11, 2024 18:18
@rouault rouault added this to the 3.10.0 milestone Sep 11, 2024
@coveralls
Copy link
Collaborator

Coverage Status

coverage: 69.361% (+0.004%) from 69.357%
when pulling 1104e38 on rouault:invisible_vsimem
into 1e48b77 on OSGeo:master.

Copy link
Collaborator

@elpaso elpaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, did you consider to name this feature "HiddenFile" instead of "UnlistedFile" ?

port/cpl_vsi_mem.cpp Outdated Show resolved Hide resolved
@rouault
Copy link
Member Author

rouault commented Sep 13, 2024

did you consider to name this feature "HiddenFile" instead of "UnlistedFile" ?

indeed, I did hesitate on the naming. I considered HiddenFile but thought that it would imply the file would be totally "invisible", which is not fully the case if you know where it is / how it is named. But on reflection, a hidden file on an operating system is pretty much similar. You don't see hit if you don't enable special filters, but if you know where it is , you can access it.
I also considered VSIMemGenerateTempFilename, to be similar to the existing CPLGenerateTempFilename. What make me hesitating is that /vsimem/ files are by nature temporary or that the "Temp" would imply somehow they would be automatically deleted from memory at some point, which is not the case (well, until process termination obviously). But maybe adopting a similar name as CPLGenerateTempFilename() is better for symmetry.
So currently I'm like 50%-50% on VSIMemGenerateHiddenFilename or VSIMemGenerateTempFilename. Any preference?

@rouault
Copy link
Member Author

rouault commented Sep 13, 2024

Maybe @dbaston as a native speaker has a good suggestion on the best naming

Copy link
Contributor

@sgillies sgillies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rouault I'm going to stick to UUID strings in my own applications, but this is great 👍

//
// The high-level design constraint is that "/vsimem/.#!HIDDEN!#." acts as a
// "side" hiearchy, but still unders the "/vsimem/" namespace, so that code
// acting specifically on /vsimem/ files. The structure of the returned filename
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this reads like an unfinished sentence (and I'm not sure how to finish it).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant: "so that code having special processing of filenames starting with /vsimem/ can still work"

// - we don't want ".#!HIDDEN!#." to be listed in VSIReadDir("/vsimem/")
// - we don't want content under ""/vsimem/.#!HIDDEN!#" to be deleted by
// VSIRmdirRecursive("/vsimem/")
// - we don't want the creation of a file (or directory) called
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble understanding this. I would have expected

  • directories starting with .#!HIDDEN!#.' are not listed
  • the filename returned by the new function is in a regular directory with a name like like mktemp -u -d -p '/vsimem/.#!HIDDEN!#.'.

It sounds more complex than this. I would need to check it out and play around to understand it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine:

  • First call with VSIMemGenerateHiddenFilename("foo.gml") will return: "/vsimem/.!HIDDEN!./1/foo.gml"
  • Second call with VSIMemGenerateHiddenFilename("bar.shp") will return: "/vsimem/.!HIDDEN!./2/bar.shp"
  • Third call with VSIMemGenerateHiddenFilename("my_dir") will return: "/vsimem/.!HIDDEN!./3/my_dir", and user calls VSIMkdir("/vsimem/.!HIDDEN!./3/my_dir"), and craetes files "one" and "two" within it

Then

  • VSIReadDir("/vsimem/") does not mention ".!HIDDEN!"
  • VSIReadDir("/vsimem/.!HIDDEN!.") returns "1", "2" and "3", but this is an implementation detail, just for the purpose of our unit tests as code is not suppose to access the parent directories of the return path
  • VSIReadDir("/vsimem/.!HIDDEN!./3/my_dir") returns "one" and "two"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by VSIReadDir("/vsimem/.!HIDDEN!.") listing "1", "2", and "3", but then VSIStat telling me that "/vsimem/.!HIDDEN!./2" does not exist. Why is "2" not just a regular directory with a unique name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is "2" not just a regular directory with a unique name?

"Explicit" directories, like regular files, are listed in the oFileList member variable. If they are listed there, someone must clean up them at some point using Unlink() or Rmdir().
For a file returned by VSIMemGenerateHiddenFilename("foo.tif"), so something like "/vsimem/.!HIDDEN!./123/foo.tif", we only want the returned filename to be something explicit, because that's the only resource the user must take care of deleting, but not its parent directories "/vsimem/.!HIDDEN!./123" or ""/vsimem/.!HIDDEN!."
Hence VSIReadDir("/vsimem/.!HIDDEN!."), which is normally just for debugging purposes, has special processing to synthetize the "{counter}" directories from the filenames like "/vsimem/.!HIDDEN!./123/foo.tif", but without needing an explicit "/vsimem/.!HIDDEN!./123" entry in oFileList
We could possibly have VSIStat() work on "/vsimem/.!HIDDEN!./123" but that doesn't appear to me as being strictly needed for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand how treating these directories as a special case avoids the need to clean them up later. And I guess forcing every temporary file into a separate "directory" avoids potential collisions from sidecar files. It seems like a complex solution relative to following an existing API like mktemp, but not something I feel terribly strongly about.

port/cpl_vsi_mem.cpp Show resolved Hide resolved
@elpaso
Copy link
Collaborator

elpaso commented Sep 16, 2024

did you consider to name this feature "HiddenFile" instead of "UnlistedFile" ?

indeed, I did hesitate on the naming. I considered HiddenFile but thought that it would imply the file would be totally "invisible", which is not fully the case if you know where it is / how it is named. But on reflection, a hidden file on an operating system is pretty much similar. You don't see hit if you don't enable special filters, but if you know where it is , you can access it. I also considered VSIMemGenerateTempFilename, to be similar to the existing CPLGenerateTempFilename. What make me hesitating is that /vsimem/ files are by nature temporary or that the "Temp" would imply somehow they would be automatically deleted from memory at some point, which is not the case (well, until process termination obviously). But maybe adopting a similar name as CPLGenerateTempFilename() is better for symmetry. So currently I'm like 50%-50% on VSIMemGenerateHiddenFilename or VSIMemGenerateTempFilename. Any preference?

+1 for VSIMemGenerateHiddenFilename

@rouault
Copy link
Member Author

rouault commented Sep 16, 2024

+1 for VSIMemGenerateHiddenFilename

Renamed to VSIMemGenerateHiddenFilename

@rouault rouault changed the title Add VSIMemGenerateUnlistedFilename() and use it in drivers Add VSIMemGenerateHiddenFilename() and use it in drivers Sep 16, 2024
@rouault rouault merged commit ce6bae8 into OSGeo:master Sep 16, 2024
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants