alternative filename storage for very long filenames #7

vgough · 2014-08-22T05:18:08Z

Long filenames can exceed the filesystem limits after encryption & encoding. An alternative is to store filenames as file contents.

Just to revisit the current design, it is like this:

[ plain text filename ] --- encryption ---> [ encrypted blob ] --- base64 ---> [ encrypted filename ]

For short filenames (<= 176 bytes), nothing changes
For longer filenames, I think we should do this:

[ plain text filename ] --- encryption ---> [ encrypted blob ] --- sha256 --- hexdump ---> [ filename hash ]

The file would be called
longfn.[ filename hash ]
and there is a second file,
longfn.[ filename hash ].fullname
that stores the "encrypted blob".

I think this is the way to go. Build it into encfs instead of creating an external filesystem. This would be difficult to test and use and hence error prone and fragile.

The construction using the sha256 hash makes sure file lookup is efficient even for long file names. There should be no performance impact for directories not containing long file names. Each long file name in a folder will hit readdir performance with the penalty of reading the longfn.[ filename hash ].fullname.

Pull requests welcome!

The text was updated successfully, but these errors were encountered:

fulldecent · 2015-02-28T15:24:06Z

-1 this significantly increases the cost of readdir

Readme.md:

Fast on classical HDDs

EncFS is typically much faster than ecryptfs for stat()-heavy workloads when the backing device is a classical hard disk. This is because ecryptfs has to to read each file header to determine the file size - EncFS does not. This is one additional seek for each stat. See PERFORMANCE.md for detailed benchmarks on HDD, SSD and ramdisk.

vgough · 2015-03-01T06:03:33Z

It isn't necessary to store all filenames this way, just the ones that would otherwise be too long. Ideally you would only pay a performance penalty when using long filenames, otherwise you wouldn't see a difference.

fulldecent · 2015-03-01T12:57:40Z

Got it. Because encrypted files may be moved across different host filesystems, "too long filesystem" must be defined independent of host filesystem. Practically, most filesystems support 255 bytes (https://en.wikipedia.org/wiki/Comparison_of_file_systems). So this issue is more specifically, "alternative filename storage for filenames over 190 characters".

Aikhjarto · 2015-03-02T08:47:31Z

Not only too long filenames might raise problems but also too long pathnames.
E.g., last year I had some problems with encfs in combination with rsync hosters. They supported only 255 bytes (minus the /home/$USER string) as maximum pathname.

I would really like to see an option for encfs to set maximum length of filenames and pathnames separately.

lalomartins · 2015-04-01T22:21:43Z

FWIW, there are two reasons I'm looking for an alternative to ecryptfs, and the top one is file name length; ecryptfs' limit (135 characters IIRC) is just not workable for 2015. (The second reason is performance on regular hard disks, which is why I ended up looking at encfs.)

fulldecent · 2015-05-04T01:16:33Z

For people that are following this issue, I am interested to know if it is because your file names are long or because there is a lot of folder nesting? If it is the latter then storing file name in the file wont help here.

Here is one solution:

Store naming information in a "directory file" in each directory. Example cipher text:

/index.idx
/1.dat
/2.dat
/3.dat
/4/
/4/index.idx
/4/1.dat

This solution creates a locking problem with index.idx.

lalomartins · 2015-05-04T01:20:11Z

It's because the filenames are long, for me at least. Some programs generate filenames automatically based on some data or another, hashes in particular, and they assume filesystems have a reasonable limit, something like 256. One such program had a bug reported on that topic and closed it with a “wontfix” since, well, if it's just one or two filesystems that have such a small limit, better that people just don't use those, eh? :-)

vgough · 2015-06-11T03:07:54Z

Most people have probably never had to worry about short names (8+3 anyone?), and hash-based names have become common with programs like Git, as it provides a convenient way to do content-based addressing. So I think that long names are here to stay.

One problem is that encryption expands the names, so those long names become even longer. It seems feasible to hide long names from the underlying FS without big changes to the encfs, and without impacting performance of anyone who isn't using long names. For example, handle filenames as normal unless it has a certain prefix, in which case you have to read the name from elsewhere (like the file header). Older versions of encfs would be able to read everything but the long files, as it wouldn't be able to decode those names. As @fulldecent points out, there isn't any one consistent limit to target, so this would probably be a FS config option.

Producing shorter paths would be a more major overhaul, not something I'd imagine being backward compatible with existing filesystems. So I'm not inclined to worry about path limits.

lalomartins · 2015-06-11T14:05:20Z

I'd be fine with this solution.

What do you wean with the config option? Minimum length to encode, or maximum to allow? I think the latter isn't necessary at all — it would be kind of ironic if after this change encfs becomes one of the few fs to allow arbitrarily long filenames and then you need to rename your files when you want to move them out 😉 And minimum length to encode can be easily autodetected on mount, based on the host filesystem limit + encoding overhead.

vgough · 2015-06-12T05:16:36Z

Config would be for the maximum encoded name length. Since Encfs volumes can be moved between systems, an Encfs folder could always contain longer names than the underlying storage system supports, so it's no different than what a user would already face if they tried to move files from one filesystem to one with more restrictive limits.

dpc · 2015-07-20T18:35:26Z

My problem is encfs produces too long filenames, and due to this I can't sync it with other hosts using tools like syncthing.

Wouldn't it be possible to have an option that would store metadata in a separate file or something? I don't care about the performance. I just want everything to work smoothly.

hpctech · 2015-11-30T12:20:10Z

+1
I've same issue with failed: File name too long (36) although file system of both source and target disks is EXT4.

The full log of rsync is: https://paste.kde.org/pkkasicb
Encfs: 1.7.4
Linux mint 17.2 xfce x64

rfjakob · 2015-11-30T12:23:31Z

Just because i am curious: what kind of filenames are you working with that are so long? (The paste link you sent does not work)

hpctech · 2015-11-30T12:28:01Z

@rfjakob

what kind of filenames are you working with that are so long?

As I mentioned above I use ext4

The paste link you sent does not work

Sorry. I created a new one:
https://gist.github.com/hpctech/c6a88435280ef6693c86#file-rsync-log

rfjakob · 2015-11-30T12:45:26Z

What I mean is that I don't understand why you have file names that are so long. Must be over 160 character unencrypted.

hpctech · 2015-11-30T13:22:47Z

@rfjakob

Must be over 160 character unencrypted

I'm not sure what if there is file name(s) more than 160 chr. as unencrypted form but what I'm sure of is I don't face any problem during copy files using cp or rsync in unencrypted mode

lalomartins · 2015-11-30T13:36:49Z

Transmission creates “resume” files which are IIRC the torrent's filename plus its hash. I of course know nothing whatsoever about illegal torrents, but legitimate ones often have very long names, with project name, release number, source, and sometimes a small CRC; when Transmission adds the hash, those often go over 160.

I've “solved” that by keeping those out of encfs ;-) since there's not much point in encrypting them. But if someone wants to encrypt their entire home (as I used to before switching to encfs), that would be a problem.

fulldecent · 2015-11-30T16:23:33Z

Just being creative here.

What if this capability was delivered by a separate Fuse module called, say, foldfs. You mount a "source" filesystem with fold or could mount a "folded" system with --reverse. There is also configuration file, just like EncFS.

For #7, we would support the feature --max-filename-length which restricts all filenames in the folded system to a specified length. We don't care how, but the implementation would probably save the original name into a header in that file. Also, handling (or forbidding) collisions would be in scope of the implementation. Then EncFS would mark #7 as WONTFIX and just say "you should use fold".

For the future, other options may include: making filenames lowercase (case insensitive), restricting file length (splitting), restricting dots at the beginning of the name, saving OS X metadata into flat files, restricting directory depth, unicode filenames, getting around other restrictions in https://en.wikipedia.org/wiki/Comparison_of_file_systems

This glue would be useful to anyone needing to store files into a more restrictive file system (backups/rsyncs, other use cases above). And this would be reusable for people needing the above but not needing encryption.

hpctech · 2015-12-01T12:23:28Z

Guys I'm really confused.

What the verdict of this discussion? Do I've to stop using EncFS because of this bug?
I'm not nagging but from end-user point this bug really awful because it occurs frequently in many cases.

Thanks for your contributions and I wish to get a solution because until now EncFS is the only choice for me under Linux.

rfjakob · 2015-12-01T12:30:58Z

@hpctech I am sorry, but I think it is unlikely that this will be fixed.

lalomartins · 2015-12-01T12:32:06Z

@fulldecent I think a “fold” filesystem is a good idea in its own merit. It's not a fix for this issue, though, which is an EncFS bug.

To put it simply: if you're a filesystem, and you're going to generate filenames based on the original name plus something, it's your job to make sure it still works.

As a tangential point, since there's a direct correlation between encoded name length and original name length, I tend to feel the level of protection/encryption is less than ideal. For particularly large data sets, this could be exploited to correlate them, especially when combining with file sizes, and then you have a huge body of plain text to help break the encryption.

hpctech · 2015-12-01T13:00:34Z

@rfjakob It seems I've to move to cryptomator although it's not efficient as EncFS

Code7R · 2016-01-28T06:54:27Z

@vgough From the Debian packaging front, there is https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=536752 about this issue... is there any workaround that gets your blessing?

fulldecent · 2016-01-28T14:46:32Z

The fact that issue is still open indicates indecision and prevents everyone from working towards a consensus solution (or multiple competing solutions).

@rfjakob [Above](Update missing ALT tags) you mention:

I am sorry, but I think it is unlikely that this will be fixed.

If this is true, I would respectfully request this issue to be closed as WONTFIX. Further, your endorsement of a foldfs-style or other approach would be enough for me (and perhaps others) to start working on that solution. End result is: interested people start working on a solution and maybe in the future it can even be merged into encfs proper.

rfjakob · 2016-01-30T22:11:35Z

William, i will gladly support a pull request if you implement this, and so
will valient i believe.

I will read through the proposals tomorrow and think about what i find the
most viable.

rfjakob · 2016-01-31T15:33:07Z

@fulldecent Just to revisit the current design, it is like this:

[ plain text filename ] --- encryption ---> [ encrypted blob ] --- base64 ---> [ encrypted filename ]

For short filenames (<= 176 bytes), nothing changes
For longer filenames, I think we should do this:

[ plain text filename ] --- encryption ---> [ encrypted blob ] --- sha256 --- hexdump ---> [ filename hash ]

The file would be called
longfn.[ filename hash ]
and there is a second file,
longfn.[ filename hash ].fullname
that stores the "encrypted blob".

I think this is the way to go. Build it into encfs instead of creating an external filesystem. This would be difficult to test and use and hence error prone and fragile.

The construction using the sha256 hash makes sure file lookup is efficient even for long file names. There should be no performance impact for directories not containing long file names. Each long file name in a folder will hit readdir performance with the penalty of reading the longfn.[ filename hash ].fullname.

fulldecent · 2016-03-02T14:43:24Z

@rfjakob I have been looking at this and am almost ready to get started. Previously I was thinking to do a separate filesystem which solves this and other problems. For that, the approach:

... --- sha256 --- hexdump ---> [ filename hash ]
longfn.[ filename hash ]
longfn.[ filename hash ].fullname

is great. The only problem is that fopen() goes from O(1) to O(n) because hash is one way. But if this is built into encfs we can also assume that [ encrypted blob ] is uniform. Therefore we can get O(1) easily using:

[ plain text filename ] --- encryption ---> [ encrypted blob ] --- base64 --- substr ---> [ encrypted prefix ]

The file would be called
longfn.[ encrypted prefix ]
and there is a second file,
longfn.[ encrypted prefix ].fullname
that stores the "encrypted blob".

Does this make sense?

rfjakob · 2016-03-03T00:11:57Z

Hi there, I have implemented the scheme I proposed in gocryptfs by now (latest master), if you want to take a look.

fopen should be O(1) because you just open the file by hash instead of by name. Readdir becomes O(n + 2*m), with n the number of short files and m the number of long files in the directory.

fulldecent · 2016-03-05T17:31:45Z

Got it, thank you.

My main use case is --reverse. Is your approach's complexity still correct there?

rfjakob · 2016-03-15T08:13:07Z

Oh, reverse mode! No, this would be O(n), as you said. But it looks like the substring approach would be just the same. You still have to read the whole directory to find the right file.

But for reverse mode you can cache the plaintext <-> hash relation. Having maybe 1000 cached entries will probably eliminate most directory reads.

gwpl · 2016-05-14T15:04:55Z

I had similar idea for another overlay filesystem that just resolves for "too long filenames" otherwise proxy 1:1 with underlaying filesystem. While searching for such "fuse overlaying filesystem too long filename" , I've found that I am not the only one with idea: #7 (comment) .

Has anyone actually came across some fuse overlay filesystem providing such functionality?
It should be pretty straightforward implementation comparing to other filesystems.

Cross referencing: I've asked also here: http://unix.stackexchange.com/q/283149/9689

fulldecent · 2016-05-15T15:40:10Z

@gwpl In summary from this discussion above: a filename-translating filesystem works and the way to set it up is pretty clear; but the reversing process can be slow because of how filesystems work.

gwpl · 2016-05-15T17:52:11Z

@fulldecent , I assume you've meant that it's builtin for EncFS, do you ?
I appreciate such built-in feature of EncFS, still would like to have ability to add one more Overlay FS resolving only this issue (without encryption etc). This might have multiple applicaitons, including -> when you are forced to use ecryptfs.
If I missed something please correct me.

Btw. If EncFS has support for encoding long filenames, one might want to update:
https://en.wikipedia.org/wiki/EncFS#No_support_for_very_long_filenames .

rfjakob · 2016-05-18T06:43:02Z

No, it's not built into EncFS (yet).

Regarding ecryptfs, I am sceptical that you can mount ecryptfs on to of a FUSE file system. It doesn't even work properly on NFS ( https://bugs.launchpad.net/ecryptfs/+bug/277578 ).

gwpl · 2016-05-18T16:25:47Z

@rfjakob so is it gocryptfs?
I through other way rount - to mount one more layer above ecryptfs, that would just translate longfilenames and do nothing else (i.e. to proxy all other functionality to underlying filesystem).
Can gocryptfs be configured to work in such mode? Could you provide such example in it's documentation and provide here a reference? Thank, you :).
(p.s. still, having separate short&sweet project that would be specialized just in fixing problem of long filenames would be both: nice, easy to review, and educative)

dimovnike · 2016-06-02T16:44:44Z

Hi, another question about filenames: is there a way to make encfs -r generate only lowercase filenames? (I'm having problems with some dropbox like application becaue of case sensitive filenames)

imperative · 2016-06-25T09:02:03Z

The index-file (a special file that is stored in every directory) could be made in something like sqlite, thus improving the performance somewhat (although probably would not allow for multithreading).

Besides some extra complexity and an extra dependency, is this solution generally better than "longfn.[ filename hash ].fullname"- approach?

I think many people who will use this (for backups especially) will not care much about filename-seeking performance. (Or will not come to use cases where this performance will matter. The number of files will be so small relative to the current performance of common storage solutions.)

Also many times the path length is indeed the problem (on windows a lot), but in practice it only becomes the problem because encfs bloats the filenames. Considering this, if we implement "longfn" solution, the actual signature itself should be shorter than "longfn" - preferably only one character, like "l.". (That is still enough as a signature because it will be distinct enough in encfs filenaming scheme, right?)

fulldecent · 2016-06-26T18:42:06Z

@imperative This fails at multithreading, which is a baseline assumption of encfs.

rfjakob · 2016-06-26T19:07:09Z

The problem with any centralized database is that is basically explodes once multiple encfs processes write to the same directory. So this would kill the dropbox use case.

Owyn · 2016-11-07T11:31:24Z

Can't we just make SHORT filenames? (not longer than original names)

Look at how others handle this: CrococryptMirror for example - it encrypts every filename into 4 symbol filename (which is in 99% times shorter than the original)

Environment

Windows version: Yes

Description

Even with an option to have shortest possible encrypted filenames - stream - those are still longer than originals, i mean usually twice as long, eg:
test.txt - is now twice longer vKeFilGHSVaediHAJ,

that's not really the problem, the real problem comes with folders and subfolders and subfolders inside subfolders... few folders placed in each other easily reach the filesystem max-length for name, filesystem have problems with it, programs have problems with it, cloud-storage have problems with it.. just problems we have.

Expected behavior vs. actual behavior

we should have an option for kitchen-level user filename protection which is not null but won't be longer than original

something like:
original: test.txt (8 symbols)
output: vKeFideJ (8 symbols or shorter (just not longer))

to not put the original key in danger by this (or however filename encryption is done), another key(password?) could be used, we need SOMETHING to not have problems with name-lengths of files...

fulldecent · 2016-11-07T17:58:34Z

@Owyn See that new .container file on the right image? This is a directory index. Please read the discussion above as this has been discussed at length.

Owyn · 2016-11-07T20:22:06Z

@fulldecent ok, but what about just some simply filename encoding? Like just XOR encryption or something which wouldn't make filenames longer, I know it's not military-level defense, but I just want things to work and not break cause of path-length limit. It's not necessary to use whole key for this, just a part of it or something derived to not risk it, so even if filenames get cracked - contents won't - a weak encryption is much more better for filenames than no encryption at all as it is now the only possible way to use encfs for many use-cases including mine.

ghost · 2017-09-09T11:33:11Z

I'm hitting this issue. I'm saving my bitcoin wallet in encfs. Any solutions to this?

rfjakob · 2017-09-09T11:47:18Z

Nothing yet in encfs. If you want long file names, you have two options:

create a new encfs filesystem through expert mode, with file name encryption disabled
switch to something else

therealmarv · 2017-11-05T11:40:25Z

For the windows users: I think (but untested) that this is resolved in newest Windows 10 with this fix: https://www.howtogeek.com/266621/how-to-make-windows-10-accept-file-paths-over-260-characters/ Maybe somebody can confirm? See also jetwhiz#63

Francewhoa · 2020-11-14T07:02:19Z

This is to confirm this challenge is still present with EncFS 1.9.5

Using:

EncFS 1.9.5 default configuration
Debian 10 Buster

Blackclaws · 2021-01-03T13:14:29Z

This issue is definitely something that produces problems when using encfs in conjunction with nextcloud, as that also has issues with longer filenames.

JanKanis · 2024-04-25T21:05:20Z

ok, but what about just some simply filename encoding? Like just XOR encryption or something which wouldn't make filenames longer, I know it's not military-level defense, but I just want things to work and not break cause of path-length limit.

I was going to comment something similar. There are length preserving encryption modes that are quite secure (xts, adiantum, hctr2). But the main length increasing step is converting the binary encrypted data to base64. Without that, the encrypted file names would contain random binary data, or at least random rare unicode characters, which is probably going to cause more problems than the current length limitations.

vgough added backward incompatible and removed backward incompatible labels Aug 29, 2014

Germar mentioned this issue Oct 11, 2015

encfs filename too long bit-team/backintime#445

Closed

rfjakob added 1.x candidate and removed 2.x candidate labels Jan 31, 2016

dpc mentioned this issue Apr 23, 2016

Unexpected "filename too long"-error on ecryptfs / encfs syncthing/syncthing#547

Closed

AudriusButkevicius mentioned this issue Jun 25, 2016

Syncthing runs into encfs file name length limit syncthing/syncthing#3338

Closed

valentin-lup mentioned this issue Sep 1, 2016

Segmentation Fault on mount, before password check #201

Closed

jetwhiz mentioned this issue Nov 7, 2016

Path name too long issue - request for SHORTER filename-lengths jetwhiz/encfs4win#63

Closed

benrubson mentioned this issue Mar 20, 2017

EncFS development ? #314

Closed

taz-007 mentioned this issue May 21, 2018

Error after upgrading archlinux32 #524

Closed

alternative filename storage for very long filenames #7

alternative filename storage for very long filenames #7

Comments

vgough commented Aug 22, 2014

fulldecent commented Feb 28, 2015 • edited Loading

vgough commented Mar 1, 2015

fulldecent commented Mar 1, 2015

Aikhjarto commented Mar 2, 2015

lalomartins commented Apr 1, 2015

fulldecent commented May 4, 2015 • edited Loading

lalomartins commented May 4, 2015

vgough commented Jun 11, 2015

lalomartins commented Jun 11, 2015

vgough commented Jun 12, 2015

dpc commented Jul 20, 2015

hpctech commented Nov 30, 2015

rfjakob commented Nov 30, 2015 via email

hpctech commented Nov 30, 2015

rfjakob commented Nov 30, 2015 via email

hpctech commented Nov 30, 2015

lalomartins commented Nov 30, 2015

fulldecent commented Nov 30, 2015

hpctech commented Dec 1, 2015

rfjakob commented Dec 1, 2015 via email

lalomartins commented Dec 1, 2015

hpctech commented Dec 1, 2015

Code7R commented Jan 28, 2016

fulldecent commented Jan 28, 2016

rfjakob commented Jan 30, 2016

rfjakob commented Jan 31, 2016

fulldecent commented Mar 2, 2016

rfjakob commented Mar 3, 2016

fulldecent commented Mar 5, 2016

rfjakob commented Mar 15, 2016

gwpl commented May 14, 2016

fulldecent commented May 15, 2016

gwpl commented May 15, 2016

rfjakob commented May 18, 2016

gwpl commented May 18, 2016

dimovnike commented Jun 2, 2016

imperative commented Jun 25, 2016

fulldecent commented Jun 26, 2016

rfjakob commented Jun 26, 2016 via email

Owyn commented Nov 7, 2016

Environment

Description

Expected behavior vs. actual behavior

fulldecent commented Nov 7, 2016

Owyn commented Nov 7, 2016

ghost commented Sep 9, 2017

rfjakob commented Sep 9, 2017

therealmarv commented Nov 5, 2017 • edited Loading

Francewhoa commented Nov 14, 2020 • edited Loading

Blackclaws commented Jan 3, 2021

JanKanis commented Apr 25, 2024

fulldecent commented Feb 28, 2015 •

edited

Loading

fulldecent commented May 4, 2015 •

edited

Loading

therealmarv commented Nov 5, 2017 •

edited

Loading

Francewhoa commented Nov 14, 2020 •

edited

Loading