Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hdfs hook dependency on snakebite throwing error #77

Open
kanirudhumar92 opened this issue Mar 28, 2017 · 14 comments
Open

Hdfs hook dependency on snakebite throwing error #77

kanirudhumar92 opened this issue Mar 28, 2017 · 14 comments

Comments

@kanirudhumar92
Copy link

[2017-03-28 14:53:07,932] {models.py:266} ERROR - Failed to import: /usr/local/lib/python3.5/site-packages/airflow/example_dags/example_http_operator.py
Traceback (most recent call last):
File "/usr/local/lib/python3.5/site-packages/airflow/models.py", line 263, in process_file
m = imp.load_source(mod_name, filepath)
File "/usr/local/lib/python3.5/imp.py", line 172, in load_source
module = _load(spec)
File "", line 693, in _load
File "", line 673, in _load_unlocked
File "", line 673, in exec_module
File "", line 222, in _call_with_frames_removed
File "/usr/local/lib/python3.5/site-packages/airflow/example_dags/example_http_operator.py", line 20, in
from airflow.operators.sensors import HttpSensor
File "/usr/local/lib/python3.5/site-packages/airflow/operators/sensors.py", line 33, in
from airflow.hooks.hdfs_hook import HDFSHook
File "/usr/local/lib/python3.5/site-packages/airflow/hooks/hdfs_hook.py", line 20, in
from snakebite.client import Client, HAClient, Namenode, AutoConfigClient
File "/usr/local/lib/python3.5/site-packages/snakebite/client.py", line 1473
baseTime = min(time * (1L << retries), cap);
^
SyntaxError: invalid syntax

@alexthehurst
Copy link

Hi @Anirudh-zemoso,

Did you modify the Dockerfile to install Python 3 instead? The reason I ask is because it seems Snakebite is incompatible with Python 3, which is the primary reason that @puckel hasn't added Python 3 support to docker-airflow. At least, that's if I'm reading this right: #74

Based on this snakebite issue, it looks like it may be a long time until Snakebite is upgraded, if it ever is: spotify/snakebite#62

@puckel, I've been optimistically watching for a solution to the Python 3 issue (since I'd like to base my ETL system on your Airflow distribution). Do you think only way around the Snakebite dependency would be dropping HDFS support entirely? (I'd be in favor of that since I don't plan to use it, but I can see how this could be a problem for many users.)

@kanirudhumar92
Copy link
Author

Yes, i'm using python3 , Thanks, i'm not using hdfshook now..

@ghost
Copy link

ghost commented Jun 15, 2017

Does this mean airflow currently does not work with python3? I am failing to intiailizedb with airflow installed this way.

@LanDeQuHuXi
Copy link

@revolucion09
No, it's just the HDFS hook is not working with python3.

Even with this error message, as long as you are not using HDFS with airflow, it should be ok.
I'm running Airflow on docker with python3 for a while now, everything works

@ghost
Copy link

ghost commented Aug 15, 2017

I stumbled on another thing that doesnt work for the same reason. When using the TimeSensor in airflow, it actually imports the HDFSHook also. This import has a dependency on snakebite, which fails with Python3.

Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/airflow/models.py", line 264, in process_file
m = imp.load_source(mod_name, filepath)
File "/usr/lib/python3.5/imp.py", line 172, in load_source
module = _load(spec)
File "", line 693, in _load
File "", line 673, in _load_unlocked
File "", line 665, in exec_module
File "", line 222, in _call_with_frames_removed
File "/home/airflow/airflow/dags/etl_hadoop_out_daily.py", line 6, in
from airflow.operators.sensors import TimeSensor
File "/usr/lib/python3.5/site-packages/airflow/operators/sensors.py", line 32, in
from airflow.hooks.hdfs_hook import HDFSHook
File "/usr/lib/python3.5/site-packages/airflow/hooks/hdfs_hook.py", line 20, in
from snakebite.client import Client, HAClient, Namenode, AutoConfigClient
File "/usr/lib/python3.5/site-packages/snakebite/client.py", line 1473
baseTime = min(time * (1L << retries), cap);
^
SyntaxError: invalid syntax

I will bug report this also on official repo...

@ghost
Copy link

ghost commented Aug 28, 2017

This issue is open, but I see a commit 87db6f5 that changes the docker-airflow project to use Python-3. It runs, but the logs show an error as @c75 has pointed out. The error happens with the example dags too if you have those switched on. Running initdb causes them to load to the dagbag, which then throws that same error.

Sadly, I see a commit in the snakebite project 3 years ago that adds python3 support, yet subsequent commits have added code that is not python 3 compatible. I've added some notes there.

I am wanting to use s3 and EMR with airflow & python3, wondering if this is a showstopper for that.

@ghost ghost mentioned this issue Aug 28, 2017
@ghost
Copy link

ghost commented Aug 28, 2017

In the airflow code, there is exception handling that sets a flag when snakebite is not installed, and as a consequence, code that is not actually using snakebite will not cause errors.

So basically i added a step in my Dockerfile that uninstalls snakebite right after installing airflow and the airflow modules I need. And that allowed me to use TimeSensor and probably the other sensors as well.

So that could work for you also on EMR. In theory, only the code dealing with HDFS actually needs Snakebite so I expect the other code to work under Python3 when snakebite is not installed.

@ghost
Copy link

ghost commented Aug 28, 2017

Thanks for that suggestion, that gives me some hope before I give up on Python3 for Airflow. In my case I have no real need to interface directly with HDFS, as long as I can run spark submit. Meanwhile it seems like a shame that people go to the trouble of open sourcing a tool only to have people uninstall it because it's not kept modern. I know I could contribute to that project by modernising, but when most of the committers at Spotify have no incentive to keep it that way it seems futile. Maybe they will internally switch to Python3 at some point. Apart from legacy library support, there's really no justification anymore to stick with 2, Python3 is mature these days.

@ghost
Copy link

ghost commented Aug 28, 2017

Maybe its possible to fork it and make a Python3 version, but its too much work to keep it up to date if Spotify is adding a lot of new code to Snakebite for Python2...

Anyway, Im pretty sure you will get it to work by simply uninstalling snakebite. If you run into some non-hdfs related thing that doesnt work, add it to this thread please so we all know. :)

@tedmiston
Copy link

tedmiston commented Jan 11, 2018

I wonder if it would be possible to swap out the dependency of HDFSHook on snakebite in favor of an alternative HDFS client/wrapper?

This blog post from @wesm makes it sounds like the pure C++ libhdfs3, now part of Apache HAWQ, could be a candidate. Perhaps he would know more about whether this is a feasible idea.

http://wesmckinney.com/blog/python-hdfs-interfaces/

There's also an open discussion from 2015 on doing something like that in snakebite at spotify/snakebite#145.

Edit: It looks like someone added a WebHDFSHook with apache/airflow#604 in 2015 which wraps hdfscli. I'm not sure if this is a complete replacement for the other HDFS hook as both still seem to be maintained.

@wesm
Copy link

wesm commented Jan 11, 2018

The HAWQ developers have advised us against relying on libhdfs3 for any production software. My understanding is that the best option continues to be the JNI-based libhdfs C library

@tedmiston
Copy link

Okay, so I'm trying to determine if the HdfsCLI Python package is built on lidbhdfs or something else. Or perhaps that it uses WebHDFS / HttpFS makes it not even require a native client locally.

https://hdfscli.readthedocs.io/en/latest/

https://github.com/mtth/hdfs

(This is all pretty new to me.)

@elukey
Copy link

elukey commented Oct 28, 2019

Didn't see it mentioned before so adding this https://pypi.org/project/snakebite-py3/

It seems that the internetarchive project is maintaining a new py3 version of snakebite, any plans of using it?

@RomHartmann
Copy link

If you're upgrading airflow, unfreeze all your pip dependencies (then freeze them again). It's possible some of them are still pulling in snakebite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants