Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError when parsing BUFR file from DWD #28

Open
guidocioni opened this issue May 20, 2021 · 9 comments
Open

UnicodeDecodeError when parsing BUFR file from DWD #28

guidocioni opened this issue May 20, 2021 · 9 comments

Comments

@guidocioni
Copy link

I haven't seen an open issue on this, forgive me if that's not the case.

I'm running the master version with eccodes v2.21.0.

I can successfully read the BUFR files from German weather stations here https://opendata.dwd.de/weather/weather_reports/synoptic/germany/ (like @meteoDaniel) but not the international ones here https://opendata.dwd.de/weather/weather_reports/synoptic/international/. In the latter case after doing this

df_stations = read_bufr('/tmp/latest.bin',
          columns=('stationOrSiteName',
                   'latitude',
                   'longitude',
                   'heightOfStationGroundAboveMeanSeaLevel',
                   'year', 'month', 'day', 'hour', 'minute',
                   ))

I get

~/miniconda3/lib/python3.8/site-packages/gribapi/gribapi.py in grib_get_string(msgid, key)
    489     err = lib.grib_get_string(h, key.encode(ENC), values, length_p)
    490     GRIB_CHECK(err)
--> 491     return ffi.string(values, length_p[0]).decode(ENC)
    492 
    493 

UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)

I can successfully see file content using grib_dump but I would like to avoid having to dump everything into a json before :)

@shahramn
Copy link

Using the ecCodes bufr_filter tool with the rule

set unpack = 1;
print "Msg #[count]:
   stationOrSiteName=[stationOrSiteName]
   latitude=[latitude], longitude=[longitude],
   heightOfStationGroundAboveMeanSeaLevel=[heightOfStationGroundAboveMeanSeaLevel],
   [year], [month], [day], [hour], [minute]";

It worked for me (no errors issued) using the two input files:

Z__C_EDZW_20210518110802_bda01,synop_bufr_999999_999999__MW_480.bin
Z__C_EDZW_latest_bda01,synop_bufr_999999_999999__MW_XXX.bin

@shahramn
Copy link

Can you please try with fewer keys to pin down which one is causing the error?

@guidocioni
Copy link
Author

Can you please try with fewer keys to pin down which one is causing the error?

You're right, without stationOrSiteName I can succesfully read the BUFR file.

I bet it has to do with the fact that some station names contain weird characters :)

Any workaround to avoid the encoding issue?

@shahramn
Copy link

Many thanks.
Actually it seems this is because some stationOrSiteName values are MISSING. Normally this is a string e.g. VERLEGENHUKEN.
This looks like a bug in the ecCodes Python bindings. I am investigating further

@guidocioni
Copy link
Author

guidocioni commented May 20, 2021

Many thanks.
Actually it seems this is because some stationOrSiteName values are MISSING. Normally this is a string e.g. VERLEGENHUKEN.
This looks like a bug in the ecCodes Python bindings. I am investigating further

Yes, from the dump of the bufr the only weird thing that I can see are some stations with missing names/type :) (in the dump JSON this is printed as null)

@shahramn
Copy link

I have confirmed that this is indeed a bug in the underlying ecCodes Python3 interface. I am working on a fix

@shahramn
Copy link

shahramn commented May 22, 2021

Can you try the following as a workaround:

from gribapi import *
gribapi.ENC = "unicode-escape"

Then try the rest of your code

@guidocioni
Copy link
Author

Can you try the following as a workaround:

from gribapi import *
gribapi.ENC = "unicode-escape"

Then try the rest of your code

yep, that seems to work ;)

@shahramn
Copy link

The latest Python bindings for ecCodes fixes this (v1.3.3)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants