Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fanbox: Filename is too long, image is saved into pixivutil directory #525

Closed
2 tasks done
NHOrus opened this issue Sep 11, 2019 · 8 comments
Closed
2 tasks done

Comments

@NHOrus
Copy link
Contributor

NHOrus commented Sep 11, 2019

Prerequisites

  • Did you read FAQ section?
  • Did you test with the latest releases or commit ?

Description

On linux, trying to download work with

filenameformat = %member_id%/%image_id% - %title%
filenamemangaformat = %member_id%/%urlFilename% - %title%
filenameinfoformat = %member_id%/%image_id% - %title%

Files from fanbox are saved into pixivutil folder instead of artist folder

Artist put excessively long title:

Start downloading... Using Referer: https://www.pixiv.net/fanbox/creator/39182623/post/446430
Error at download_image(): Cannot save https://fanbox.pixiv.net/images/post/446430/O0x7ASU437evHkT61U7YsVW5.png to /home/nho/adata/pixiv/39182623/446430_p3_O0x7ASU437evHkT61U7YsVW5 - 4枚に+12枚=16カット。(カバーでは起きてますが行為中は起きません:ボテ絵)指で局部広げ・挿入に3カット・抽挿に4カット・射精に3カット・ペニス引き抜き溢れ精液に3カット・ボテ1カット になります。(+下書き一枚を挟んで文字なしver.).png: (<type 'exceptions.IOError'>, IOError(36, 'File name too long'), <traceback object at 0x7fe9e9f587a0>)
File is saved to O0x7ASU437evHkT61U7YsVW5.png

I expect that file gets saved into correct folder, possibly without title or with trunkated filename.

Versions

Current git, reported as 20190907b

@photonometric
Copy link

photonometric commented Oct 10, 2019

Ah yeah, the attempted filename/path length came out to 403 characters in that, where the limit is 255 on all common modern filesystems.

Same thing happened on a booru downloader I used to follow, if someone used a TAG variable and there were like 20-30 tags for the image. They implemented a hard character limit on all filenames.

So probably something like 250 (for buffer) - <root dir length> - <file ext> = max output length of the 3 filename format variables would solve this, as well as others that could come up without using %title%.

@NHOrus
Copy link
Contributor Author

NHOrus commented Oct 10, 2019 via email

@photonometric
Copy link

photonometric commented Oct 10, 2019

Four filename extension characters and one for separator

Sure, meant the whole path+filename+ext as the "output" in my example. Not sure what you mean by 1 char for separator; filenames might have different numbers of separations (e.g. space) depending on the number of format variables, and in the case of %tags% would depend on the number of tags in the image....so I assume that is already done on the fly to some extent and a hard trancate to x if filename length > 250 or IOError36 would be the easiest sort of thing to do. I doubt it comes up often enough to make it worth doing conditionals on what tags to include if errored.

But of course Nandaka will know the semantic details of filename/variable interaction much better than me x3 I just was giving this a bump with basic thoughts because I was testing a related filename error ^^

@NHOrus
Copy link
Contributor Author

NHOrus commented Oct 10, 2019

Ah, I misunderstood. Either way, there need to cut "The templader name" - path -.jpeg
On linux NAME_MAX is 255, PATH_MAX is 4096
On Windows, it's 260 characters

Except it's single byte on Linux and Unicode symbol on Windows, so it takes a bit more bits for same name in Linux than in Windows.

It's a mess, honestly.

@Nandaka
Copy link
Owner

Nandaka commented Oct 17, 2019

It is already cut the filename to 255 in

if len(name) > 255:

/home/nho/adata/pixiv/39182623/446430_p3_O0x7ASU437evHkT61U7YsVW5 - 4枚に+12枚=16カット。(カバーでは起きてますが行為中は起きません:ボテ絵)指で局部広げ・挿入に3カット・抽挿に4カット・射精に3カット・ペニス引き抜き溢れ精液に3カット・ボテ1カット になります。(+下書き一枚を挟んで文字なしver.).png

Should be counted as 193 chars, right? unless in linux, it is counted as double width chars for the kanji/kana.

Related call
https://github.com/Nandaka/PixivUtil2/blob/master/PixivUtil2.py#L1906

def sanitizeFilename(s, rootDir=None):

@Nandaka Nandaka closed this as completed Jan 1, 2020
@split-n
Copy link
Contributor

split-n commented Jan 1, 2020

FYI:

For Windows, usually, 255 CHARS is maximum for FULLPATH.
(There's way to expand limitation but not commonly used
https://docs.python.org/3/using/windows.html#removing-the-max-path-limitation )

For Linux, 255 BYTES is the maximum for FILENAME.
(Usually, UTF-8 is used for encoding, most Japanese chars are 3bytes but there are exceptions.)
So if a directory path is long, Linux may be able to save longer FILENAME compared to Windows.

@Nandaka
Copy link
Owner

Nandaka commented Jan 1, 2020

So if a directory path is long, Linux may be able to save longer FILENAME compared to Windows.

shouldn't be the other way around if linux limitation is based on bytes? e.g. assuming worst case scenario (3bytes per character), then the max filename character will be 255/3, isn't it?

@split-n
Copy link
Contributor

split-n commented Jan 1, 2020

In worst case, there's char that represented by 4bytes but rarely.
(a few kanjis like 𠮷 and emojis ☺) .

I come up with this code (didn't tested yet).

    if platform.system() == 'Linux':
        # Linux: cut filename <= 255 bytes
        dirname, basename = os.path.split(name)
        while len(basename.encode('utf-8')) > 255:
            filename, extname = os.path.splitext(basename)
            filename[:len(filename) - 1]
            basename = filename + extname

        name = dirname + os.sep + basename
    else:
        # cut path to 255 char
        if len(name) > 255:
            newLen = 250
            name = name[:newLen]

split-n added a commit to split-n/PixivUtil2 that referenced this issue Jan 2, 2020
Cutting filename to <= 249 bytes in UTF8 on Linux environment.
For fixing Nandaka#525

For Linux, 255 bytes is the maximum for filename (not full path).
And almost all Linux environments use UTF-8 for path encoding.
And the path is used later with suffix ".pixiv" (consume 6 bytes).
So 249 bytes is maybe a minimum.
https://github.com/Nandaka/PixivUtil2/blob/dbb810716c4824724c03ce8e022a5b5d31f41800/PixivHelper.py#L590

Windows/Mac isn't affected.
Nandaka pushed a commit that referenced this issue Jan 2, 2020
…617)

Cutting filename to <= 249 bytes in UTF8 on Linux environment.
For fixing #525

For Linux, 255 bytes is the maximum for filename (not full path).
And almost all Linux environments use UTF-8 for path encoding.
And the path is used later with suffix ".pixiv" (consume 6 bytes).
So 249 bytes is maybe a minimum.
https://github.com/Nandaka/PixivUtil2/blob/dbb810716c4824724c03ce8e022a5b5d31f41800/PixivHelper.py#L590

Windows/Mac isn't affected.
35122 pushed a commit to 35122/PixivUtil2 that referenced this issue Oct 30, 2020
…andaka#617)

Cutting filename to <= 249 bytes in UTF8 on Linux environment.
For fixing Nandaka#525

For Linux, 255 bytes is the maximum for filename (not full path).
And almost all Linux environments use UTF-8 for path encoding.
And the path is used later with suffix ".pixiv" (consume 6 bytes).
So 249 bytes is maybe a minimum.
https://github.com/Nandaka/PixivUtil2/blob/dbb810716c4824724c03ce8e022a5b5d31f41800/PixivHelper.py#L590

Windows/Mac isn't affected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants