Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to s3 #2

Merged
merged 8 commits into from
Jul 10, 2017
Merged

to s3 #2

merged 8 commits into from
Jul 10, 2017

Conversation

zelima
Copy link
Contributor

@zelima zelima commented Jul 5, 2017

This pull request fixes #1

  • I've added tests to cover the proposed changes
    • partially - see note

Changes proposed in this pull request:

  • helper function to format path (Key) to the file on s3
  • a processor for putting files on s3
  • deps updated for test

@akariv Note:

The PR is not complete - the tests are not checking if resources are on s3 as well, cause I could not find an easy way to pass the resource iterator to mock_processor_test. Some help would be useful

* removed extra deps for testing
* test for s3 dumper
* processor for dumping to s3
@zelima zelima requested a review from akariv July 5, 2017 21:14
import logging

def generate_path(file_path, base_path='', datapackage={}):
vers = datapackage.get('version', 'latest')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit hard-coded to me.

We could use a simpler (and more powerful):
path = base_path.format(**self.datapackage)
no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or even:

format_params = {'version': 'latest'}
format_params.update(self.datapackage)
base_path.format(**format_params)


def initialize(self, params):
super(S3Dumper, self).initialize(params)
self.bucket = params['bucket']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about AWS access key and secret?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akariv boto3 takes care of credential if they are set-up. It looks up in aws config file or searches for ENV variables http://boto3.readthedocs.io/en/latest/guide/configuration.html#aws-config-file
But we can have them as a part of the spec as well if that is a case

mock_processor_test(self.processor_path,
(self.params,
self.datapackage,
iter([])))
Copy link
Member

@akariv akariv Jul 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a list of lists of the resource data. e.g.:

[ 
  [ {'Name': 'a', 'Date': datetime.datetime.now().date()}, ... 
  ] 
]

Copy link
Contributor Author

@zelima zelima Jul 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting this error in that case

def handle_resources(self, datapackage,
                         resource_iterator,
                         parameters, stats):
        datapackage['count_of_rows'] = 0
        datapackage['hash'] = hashlib.md5()
        for resource in resource_iterator:
>           resource_spec = resource.spec
E           AttributeError: 'list' object has no attribute 'spec'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, try this:

res =  [ {'Name': 'a', 'Date': datetime.datetime.now().date()}, ... ] 
res.spec = datapackage['resources'][0]
res_iter = [res]

We should probably find a wrapper for that.

tox.ini Outdated
@@ -6,8 +6,8 @@ envlist=

[testenv]
deps=
mock
requests-mock
google-compute-engine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

google-compute-engine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boto was complaining on Travis

Traceback:
tests/test_to_s3.py:5: in <module>
    from moto import mock_s3
.tox/py/lib/python3.6/site-packages/moto/__init__.py:9: in <module>
    from .autoscaling import mock_autoscaling, mock_autoscaling_deprecated  # flake8: noqa
.tox/py/lib/python3.6/site-packages/moto/autoscaling/__init__.py:2: in <module>
    from .models import autoscaling_backends
.tox/py/lib/python3.6/site-packages/moto/autoscaling/models.py:2: in <module>
    from boto.ec2.blockdevicemapping import BlockDeviceType, BlockDeviceMapping
.tox/py/lib/python3.6/site-packages/boto/__init__.py:1216: in <module>
    boto.plugin.load_plugins(config)
.tox/py/lib/python3.6/site-packages/boto/plugin.py:93: in load_plugins
    _import_module(file)
.tox/py/lib/python3.6/site-packages/boto/plugin.py:75: in _import_module
    return imp.load_module(name, file, filename, data)
.tox/py/lib/python3.6/imp.py:234: in load_module
    return load_source(name, filename, file)
.tox/py/lib/python3.6/imp.py:172: in load_source
    module = _load(spec)
/usr/lib/python2.7/dist-packages/google_compute_engine/boto/compute_auth.py:19: in <module>
    from google_compute_engine import logger
E   ModuleNotFoundError: No module named 'google_compute_engine'

This answer summarises the reason GoogleCloudPlatform/compute-image-packages#262 (comment)

* passing the actuall iterator to mocked processor
* reading rows to execute iterator
* getting s3 keys with simpler way
* more usefull assertions
@akariv akariv merged commit 2a8de94 into master Jul 10, 2017
@zelima zelima deleted the feature/to_s3 branch July 10, 2017 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dump to s3
2 participants