Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"No such file or directory" error when filename contains non-ASCII character #2190

Closed
apartridge opened this issue Jan 20, 2022 · 8 comments · Fixed by #2222
Closed

"No such file or directory" error when filename contains non-ASCII character #2190

apartridge opened this issue Jan 20, 2022 · 8 comments · Fixed by #2222

Comments

@apartridge
Copy link

apartridge commented Jan 20, 2022

OS: Windows 10
Visual Studio 2017 (15.9.24)

We noticed a change in behavior when upgrading netcdf, with file names that contain non-ascii characters. The following sample fails with a "Error: No such file or directory" when using netcdf-c 4.8.1. It works on netcdf-c 4.7.0. (Note: We also used some different build options between these builds, but both built towards hdf5 1.10.5).

#include <netcdf>
#include <stdlib.h>
#include <stdio.h>
#include <netcdf.h>

#define FILE_NAME "C:/Users/X/Documents/fooÅ.nc"
#define ERRCODE 2
#define ERR(e) {printf("Error: %s\n", nc_strerror(e)); exit(ERRCODE);}

int main()
{
	std::cout << "Start\n";
	int ncid;
	int retval;

	if ((retval = nc_open(FILE_NAME, NC_NOWRITE, &ncid)))
		ERR(retval);

	if ((retval = nc_close(ncid)))
		ERR(retval);

	printf("*** SUCCESS reading example file %s!\n", FILE_NAME);
	return 0;
}

Is this a known issue?

@WardF
Copy link
Member

WardF commented Jan 20, 2022

@DennisHeimbigner could this be related to the msys pathing work?

@DennisHeimbigner
Copy link
Collaborator

Windows cross UTF8 is known to be tricky because Microsoft refuses to bite the bullet
and get rid of the windows default encoding CP-1252.
Let me try out your test program and see what is going on.

@DennisHeimbigner
Copy link
Collaborator

As near as I can tell, in the above code, the 'Å' character is CP-1252, not UTF-8.
There is probably a way to get the visual studio compiler to treat the input file as utf-8,
but I do not know what it is off-hand.

@apartridge
Copy link
Author

@DennisHeimbigner I checked the encoding closer using the following change to the sample, and yes, it is stored as CP-1252. The Å character is stored as 197 (dec) = 0xC5, matching https://en.wikipedia.org/wiki/Windows-1252. This works with netcdf-c 4.7.0, but not 4.8.1.

We observed this in a GUI application where we use Qt, we use qUrl.toLocalFile().toLocal8Bit() to produce this string. This has worked for us fine on both Linux and Ubuntu, but with our upgrade of netcdf it stopped working on Windows (it still works fine on Ubuntu).

#include <netcdf>
#include <stdlib.h>
#include <stdio.h>
#include <netcdf.h>

#define FILE_NAME "C:/Users/Stian/Documents/TestExport/fooÅ.nc"

#define ERRCODE 2
#define ERR(e) {printf("Error: %s\n", nc_strerror(e)); exit(ERRCODE);}

int main()
{
	int ncid;
	int retval;

	auto data = std::string{ FILE_NAME };
	for (std::size_t i = 0; i < data.size(); i++) {
		std::cout << " " << data[i] << " " << (int)((unsigned char)data[i]) << std::endl;
	}

	if ((retval = nc_open(FILE_NAME, NC_NOWRITE, &ncid)))
		ERR(retval);

	if ((retval = nc_close(ncid)))
		ERR(retval);

	printf("*** SUCCESS reading example file %s!\n", FILE_NAME);
	return 0;
}

@DennisHeimbigner
Copy link
Collaborator

Are you in a position to change the 'qUrl.toLocalFile().toLocal8Bit()'
to produce utf8 and try that; perhaps their exists a '.toUTF8' or similar?

@DennisHeimbigner
Copy link
Collaborator

I ran across a note that says that as of windows 10 build 1903, that the 8 bit functions
can handle utf-8 (assuming the locale is set to .UTF-8.

@apartridge
Copy link
Author

apartridge commented Jan 24, 2022

Hi @DennisHeimbigner

I think you are referring to https://docs.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page.

I have tested modifying the reproducer above, by encoding the Å character in UTF-8 ("\xc3" "\x85"), and enabling the Windows setting "Beta: Use Unicode UTF-8 for worldwide language support", following https://stackoverflow.com/questions/57131654/using-utf-8-encoding-chcp-65001-in-command-prompt-windows-powershell-window/57134096#57134096. Code below.

Then the file can be correctly opened. (Without this Beta setting enabled the file can not be opened, it gives "HDF error".)

On Windows when using the 8-bit (const char*) file functions then the encoding is expected to be matching the local encoding on Windows, which will vary by region (unless that Beta setting above is set, which sets the local encoding to UTF-8). It is mentioned in https://docs.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page, in "-A vs. -W APIs". So I think there is a regression in netcdf where it no longer works with the locale encoding on Windows, unless the locale is changed to UTF-8 using this beta setting. This worked as expected in netcdf-c 4.7.0 but seems to be broken in 4.8.1.

#include <netcdf>
#include <stdlib.h>
#include <stdio.h>
#include <netcdf.h>

#define ERRCODE 2
#define ERR(e) {printf("Error: %s\n", nc_strerror(e)); exit(ERRCODE);}

int main()
{
	std::cout << "Start\n";
	int ncid;
	int retval;

	auto str = "C:/Users/Stian/Documents/TestExport/foo" "\xc3" "\x85"  ".zdf";
	auto data = std::string{ str };
	for (std::size_t i = 0; i < data.size(); i++) {
		std::cout << " " << " " << (int)((unsigned char)data[i]) << std::endl;
	}

	if ((retval = nc_open(data.c_str(), NC_NOWRITE, &ncid)))
		ERR(retval);

	if ((retval = nc_close(ncid)))
		ERR(retval);

	printf("*** SUCCESS reading example file %s!\n", data.c_str());
	return 0;
}

@DennisHeimbigner
Copy link
Collaborator

I have begun a discussion about this here:

#2220

DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this issue Feb 9, 2022
re: Issue Unidata#2190

The primary purpose of this PR is to improve the utf8 support
for windows. This is persuant to a change in Windows that
supports utf8 natively (almost). The almost means that it is
still utf16 internally and the set of characters representable
by utf8 is larger than those representable by utf16.

This leaves open the question in the Issue about handling
the Windows 1252 character set.

This required the following changes:

1. Test the Windows build and major version in order to see if
   native utf8 is supported.
2. If native utf8 is supported, Modify dpathmgr.c to call the 8-bit
   version of the windows fopen() and open() functions.
3. In support of this, programs that use XGetOpt (Windows versions)
   need to get the command line as utf8 and then parse to
   arc+argv as utf8. This requires using a homegrown command line parser
   named XCommandLineToArgvA.
4. Add a utility program called "acpget" that prints out the
   current Windows code page and locale.

Additionally, some technical debt was cleaned up as follows:

1. Unify all the places which attempt to read all or a part
   of a file into the dutil.c#NC_readfile code.
2. Similary unify all the code that creates temp files into
   dutil.c#NC_mktmp code.
3. Convert almost all remaining calls to fopen() and open()
   to NCfopen() and NCopen3(). This is to ensure that path management
   is used consistently. This touches a number of files.
4. extern->EXTERNL as needed to get it to work under Windows.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants