Skip to content
This repository has been archived by the owner on Oct 20, 2020. It is now read-only.

Problem with importing azure.eventhub.extensions.checkpointstoreblob #56

Closed
pjachowi opened this issue Aug 18, 2020 · 4 comments
Closed

Comments

@pjachowi
Copy link

I am not sure if the problem should by fixed on Azure package design or can be fixed in this project.

I have a problem with azure.eventhub.extensions.checkpointstoreblob. Attempt to import it ends with an error

$ bazel run :main
INFO: Analyzed target //:main (18 packages loaded, 672 targets configured).
INFO: Found 1 target...
Target //:main up-to-date:
  bazel-bin/main
INFO: Elapsed time: 5.008s, Critical Path: 0.01s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
Traceback (most recent call last):
  File "/home/pjachowi/.cache/bazel/_bazel_pjachowi/0be77f5cbe63ebeec7251d4b492b30cb/execroot/example_repo/bazel-out/k8-fastbuild/bin/main.runfiles/example_repo/main.py", line 1, in <module>
    from azure.eventhub.extensions.checkpointstoreblob import BlobCheckpointStore
ModuleNotFoundError: No module named 'azure.eventhub.extensions.checkpointstoreblob'

What I suspect, having limited knowledge about Python packaging machinery, azure-eventhub-checkpointstoreblob has dependency on azure-eventhub. The azure-eventhub contains directory azure/eventhub/extensions consisting only __init__.py which terminates search for azure.eventhub.extensions package:

$ ls ./bazel-bin/main.runfiles/example_repo/external/pip/pypi__azure_eventhub/azure/eventhub/extensions
__init__.py
$ ls ./bazel-bin/main.runfiles/example_repo/external/pip/pypi__azure_eventhub_checkpointstoreblob/azure/eventhub/extensions
checkpointstoreblob  __init__.py

To reproduce the problem one needs four files in the same directory: WORKSPACE, BUILD, requirements.txt, and main.py:
WORKSPACE (same as in the example directory):

workspace(name = "example_repo")

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

http_archive(
    name = "rules_python",
    url = "https://github.com/bazelbuild/rules_python/releases/download/0.0.2/rules_python-0.0.2.tar.gz",
    strip_prefix = "rules_python-0.0.2",
    sha256 = "b5668cde8bb6e3515057ef465a35ad712214962f0b3a314e551204266c7be90c",
)

load("@rules_python//python:repositories.bzl", "py_repositories")

py_repositories()

local_repository(
    name = "rules_python_external",
    path = "../",
)

load("@rules_python_external//:repositories.bzl", "rules_python_external_dependencies")

rules_python_external_dependencies()

load("@rules_python_external//:defs.bzl", "pip_install")

pip_install(
    requirements = "//:requirements.txt",
)

BUILD:

load("@pip//:requirements.bzl", "requirement")

py_binary(
    name = "main",
    srcs = ["main.py"],
    deps = [
        requirement("azure-eventhub-checkpointstoreblob"),
    ],
)

requirements.txt:

azure-eventhub-checkpointstoreblob==1.1.0

main.py:

from azure.eventhub.extensions.checkpointstoreblob import BlobCheckpointStore

if __name__ == "__main__":
    pass
@dillon-giacoppo
Copy link
Owner

dillon-giacoppo commented Aug 18, 2020

Yep unfortunately they are not following the namespace conventions.

The culprit is indeed azure/eventhub/extensions/__init__.py and verified by inspecting the wheel:

unzip -p azure_eventhub-5.1.0-py2.py3-none-any.whl | grep azure/eventhub/extensions/__init__.py
azure/eventhub/extensions/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0

From the namespace docs:

It is extremely important that every distribution that uses the namespace package omits the __init__.py or uses a pkgutil-style __init__.py. If any distribution does not, it will cause the namespace logic to fail and the other sub-packages will not be importable.

We cannot fix this on our side because, unlike pip, we don't install all the packages into a single directory. It seems like it is currently depending on that behaviour to function.

Interestingly, the wheel for checkpointstoreblob does not have an __init__.py (even though one is specified in the source):

 unzip -p azure_eventhub_checkpointstoreblob-1.1.0-py2.py3-none-any.whl | grep azure/eventhub/extensions/__init__.py

Both __init__ files should be removed from the respective wheels. This should be a trivial fix upstream.

@pjachowi
Copy link
Author

Thank you @dillon-giacoppo for the reply! Is there any workaround I can apply, apart from requesting Azure to follow namespace convention?

@dillon-giacoppo
Copy link
Owner

There is no simple workaround that I know of because you have to literally remove that file to make it work. As noted in PEP-420, the resolver does the following:

During import processing, the import machinery will continue to iterate over each directory in the parent path as it does in Python 3.2. While looking for a module or package named "foo", for each directory in the parent path:

If /foo/__init__.py is found, a regular package is imported and returned.
If not, but /foo.{py,pyc,so,pyd} is found, a module is imported and returned. The exact list of extension varies by platform and whether the -O flag is specified. The list here is representative.
If not, but /foo is found and is a directory, it is recorded and the scan continues with the next directory in the parent path.
Otherwise the scan continues with the next directory in the parent path.
If the scan completes without returning a module or package, and at least one directory was recorded, then a namespace package is created. The new namespace package:

Has a __path__ attribute set to an iterable of the path strings that were found and recorded during the scan.
Does not have a __file__ attribute.

So it will always return the package from azure_eventhub due to rule 1. Even if you were to switch the order around on the path so azure_eventhub_checkpointstoreblob was before azure_eventhub it would not help.

@pjachowi
Copy link
Author

@dillon-giacoppo Thank you very much!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants