-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
to s3 #2
to s3 #2
Conversation
* removed extra deps for testing * test for s3 dumper * processor for dumping to s3
datapackage_pipelines_aws/helpers.py
Outdated
import logging | ||
|
||
def generate_path(file_path, base_path='', datapackage={}): | ||
vers = datapackage.get('version', 'latest') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a bit hard-coded to me.
We could use a simpler (and more powerful):
path = base_path.format(**self.datapackage)
no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or even:
format_params = {'version': 'latest'}
format_params.update(self.datapackage)
base_path.format(**format_params)
|
||
def initialize(self, params): | ||
super(S3Dumper, self).initialize(params) | ||
self.bucket = params['bucket'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about AWS access key and secret?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akariv boto3 takes care of credential if they are set-up. It looks up in aws config file or searches for ENV variables http://boto3.readthedocs.io/en/latest/guide/configuration.html#aws-config-file
But we can have them as a part of the spec as well if that is a case
tests/test_to_s3.py
Outdated
mock_processor_test(self.processor_path, | ||
(self.params, | ||
self.datapackage, | ||
iter([]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a list of lists of the resource data. e.g.:
[
[ {'Name': 'a', 'Date': datetime.datetime.now().date()}, ...
]
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting this error in that case
def handle_resources(self, datapackage,
resource_iterator,
parameters, stats):
datapackage['count_of_rows'] = 0
datapackage['hash'] = hashlib.md5()
for resource in resource_iterator:
> resource_spec = resource.spec
E AttributeError: 'list' object has no attribute 'spec'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, try this:
res = [ {'Name': 'a', 'Date': datetime.datetime.now().date()}, ... ]
res.spec = datapackage['resources'][0]
res_iter = [res]
We should probably find a wrapper for that.
tox.ini
Outdated
@@ -6,8 +6,8 @@ envlist= | |||
|
|||
[testenv] | |||
deps= | |||
mock | |||
requests-mock | |||
google-compute-engine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
google-compute-engine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boto was complaining on Travis
Traceback:
tests/test_to_s3.py:5: in <module>
from moto import mock_s3
.tox/py/lib/python3.6/site-packages/moto/__init__.py:9: in <module>
from .autoscaling import mock_autoscaling, mock_autoscaling_deprecated # flake8: noqa
.tox/py/lib/python3.6/site-packages/moto/autoscaling/__init__.py:2: in <module>
from .models import autoscaling_backends
.tox/py/lib/python3.6/site-packages/moto/autoscaling/models.py:2: in <module>
from boto.ec2.blockdevicemapping import BlockDeviceType, BlockDeviceMapping
.tox/py/lib/python3.6/site-packages/boto/__init__.py:1216: in <module>
boto.plugin.load_plugins(config)
.tox/py/lib/python3.6/site-packages/boto/plugin.py:93: in load_plugins
_import_module(file)
.tox/py/lib/python3.6/site-packages/boto/plugin.py:75: in _import_module
return imp.load_module(name, file, filename, data)
.tox/py/lib/python3.6/imp.py:234: in load_module
return load_source(name, filename, file)
.tox/py/lib/python3.6/imp.py:172: in load_source
module = _load(spec)
/usr/lib/python2.7/dist-packages/google_compute_engine/boto/compute_auth.py:19: in <module>
from google_compute_engine import logger
E ModuleNotFoundError: No module named 'google_compute_engine'
This answer summarises the reason GoogleCloudPlatform/compute-image-packages#262 (comment)
* passing the actuall iterator to mocked processor * reading rows to execute iterator * getting s3 keys with simpler way * more usefull assertions
This pull request fixes #1
Changes proposed in this pull request:
@akariv Note:
The PR is not complete - the tests are not checking if resources are on s3 as well, cause I could not find an easy way to pass the resource iterator to
mock_processor_test
. Some help would be useful