Skip to content

Commit

Permalink
Improve harvester error handling (#7886)
Browse files Browse the repository at this point in the history
* Bump urllib3 from 1.26.2 to 1.26.3 (#6908)

Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.2 to 1.26.3.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/1.26.3/CHANGES.rst)
- [Commits](urllib3/urllib3@1.26.2...1.26.3)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Toni <toni.schoenbuchner@csgis.de>

* [Fixes #6880] Circle CI upload tests fail irregulary (#6881)

* [Fixes #6880] Circle CI upload tests fail irregulary

* CircleCI test fix: sometimes expires due to upload timeout in the test environment

* - Avoid infinite loop on upload testing

* Revert "CircleCI test fix: sometimes expires due to upload timeout in the test environment"

This reverts commit 66139fd.

Co-authored-by: Alessio Fabiani <alessio.fabiani@geo-solutions.it>
Co-authored-by: afabiani <alessio.fabiani@gmail.com>

* [Fixes #6914] Remove "add to basket" tool for documents and maps (#6915)

* Added malnajdi as contributor

* [Fixes #6910] meaningful filename for document download (#6911)

* get meaningful document filenames on download

* - Strip extension from document title before slugify it (e.g.: image.jpg instead of imagejpg.jpg)

Co-authored-by: afabiani <alessio.fabiani@gmail.com>
Co-authored-by: Alessio Fabiani <alessio.fabiani@geo-solutions.it>

* - CircleCI Upload Tests: trying to reduce more the risk of infinite loop on "wait_for_progress"

* [Fixes #6916] gsimporter.api.NotFound caused by missing trailing slash at the end of GEOSERVER_LOCATION (#6913)

* [Fixes #6916] gsimporter.api.NotFound caused by missing trailing slash at the end of GEOSERVER_LOCATION

* [Fixes #6916] unit test for GEOSERVER_LOCATION

* Bump django-cors-headers from 3.6.0 to 3.7.0 (#6901)

Bumps [django-cors-headers](https://github.com/adamchainz/django-cors-headers) from 3.6.0 to 3.7.0.
- [Release notes](https://github.com/adamchainz/django-cors-headers/releases)
- [Changelog](https://github.com/adamchainz/django-cors-headers/blob/master/HISTORY.rst)
- [Commits](adamchainz/django-cors-headers@3.6.0...3.7.0)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump amqp from 5.0.3 to 5.0.5 (#6905)

Bumps [amqp](https://github.com/celery/py-amqp) from 5.0.3 to 5.0.5.
- [Release notes](https://github.com/celery/py-amqp/releases)
- [Changelog](https://github.com/celery/py-amqp/blob/master/Changelog)
- [Commits](celery/py-amqp@v5.0.3...v5.0.5)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump pip from 21.0 to 21.0.1 (#6900)

Bumps [pip](https://github.com/pypa/pip) from 21.0 to 21.0.1.
- [Release notes](https://github.com/pypa/pip/releases)
- [Changelog](https://github.com/pypa/pip/blob/master/NEWS.rst)
- [Commits](pypa/pip@21.0...21.0.1)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump coverage from 5.3.1 to 5.4 (#6903)

Bumps [coverage](https://github.com/nedbat/coveragepy) from 5.3.1 to 5.4.
- [Release notes](https://github.com/nedbat/coveragepy/releases)
- [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst)
- [Commits](nedbat/coveragepy@coverage-5.3.1...coverage-5.4)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump pytest from 6.2.1 to 6.2.2 (#6907)

Bumps [pytest](https://github.com/pytest-dev/pytest) from 6.2.1 to 6.2.2.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/master/CHANGELOG.rst)
- [Commits](pytest-dev/pytest@6.2.1...6.2.2)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump djangorestframework-gis from 0.16 to 0.17 (#6902)

Bumps [djangorestframework-gis](https://github.com/openwisp/django-rest-framework-gis) from 0.16 to 0.17.
- [Release notes](https://github.com/openwisp/django-rest-framework-gis/releases)
- [Changelog](https://github.com/openwisp/django-rest-framework-gis/blob/master/CHANGES.rst)
- [Commits](openwisp/django-rest-framework-gis@v0.16.0...v0.17.0)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* - Algin setup.cfg to requirements.txt

* [Fixes #6922][REST API v2] Expose the curated thumbnail URL if it has… (#6923)

* [Fixes #6922][REST API v2] Expose the curated thumbnail URL if it has been uploaded

* - Add REST APIs test suite to CircleCI

* [Fixes #6918] Removal of QGIS support (#6919)

* [Cleanup and Refactor] Remove QGIS server backend dependencies

* [Cleanup and Refactor] Remove QGIS server backend dependencies

* - Fix LGTM issues

* allow Basic authenticated requests in LOCKDOWN mode

* fix to avoid circular import

* flake8 check fix

* added tests

* [Fixes #6880] Circle CI upload tests fail irregulary (#6881)

* [Fixes #6880] Circle CI upload tests fail irregulary

* CircleCI test fix: sometimes expires due to upload timeout in the test environment

* - Avoid infinite loop on upload testing

* Revert "CircleCI test fix: sometimes expires due to upload timeout in the test environment"

This reverts commit 66139fd.

Co-authored-by: Alessio Fabiani <alessio.fabiani@geo-solutions.it>
Co-authored-by: afabiani <alessio.fabiani@gmail.com>

* [Fixes #6914] Remove "add to basket" tool for documents and maps (#6915)

* Added malnajdi as contributor

* Bump pip from 21.0 to 21.0.1 (#6900)

Bumps [pip](https://github.com/pypa/pip) from 21.0 to 21.0.1.
- [Release notes](https://github.com/pypa/pip/releases)
- [Changelog](https://github.com/pypa/pip/blob/master/NEWS.rst)
- [Commits](pypa/pip@21.0...21.0.1)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* - Algin setup.cfg to requirements.txt

* [Fixes #6922][REST API v2] Expose the curated thumbnail URL if it has… (#6923)

* [Fixes #6922][REST API v2] Expose the curated thumbnail URL if it has been uploaded

* - Add REST APIs test suite to CircleCI

* [Fixes #6918] Removal of QGIS support (#6919)

* [Cleanup and Refactor] Remove QGIS server backend dependencies

* [Cleanup and Refactor] Remove QGIS server backend dependencies

* - Fix LGTM issues

* allow Basic authenticated requests in LOCKDOWN mode

* fix to avoid circular import

* - Align to upstream master branch

* Improve error handling on harvesting and add more robust check for GeoNode harvester availability

* Fix minor pep8 problem

Co-authored-by: Giovanni Allegri <giohappy@gmail.com>
Co-authored-by: allyoucanmap <bovio.stefano@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Toni <toni.schoenbuchner@csgis.de>
Co-authored-by: Alessio Fabiani <alessio.fabiani@geo-solutions.it>
Co-authored-by: afabiani <alessio.fabiani@gmail.com>
Co-authored-by: Florian Hoedt <gannebamm@gmail.com>
Co-authored-by: Mohammed Y. Alnajdi <mohdnagfy@gmail.com>
Co-authored-by: biegan <bieganowski.rev@gmail.com>
  • Loading branch information
10 people committed Jul 29, 2021
1 parent d605bcd commit 8a2bc14
Show file tree
Hide file tree
Showing 3 changed files with 53 additions and 27 deletions.
18 changes: 16 additions & 2 deletions geonode/harvesting/harvesters/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,10 @@
logger = logging.getLogger(__name__)


class HarvestingException(Exception):
pass


@dataclasses.dataclass()
class BriefRemoteResource:
unique_identifier: str
Expand Down Expand Up @@ -85,14 +89,24 @@ def from_django_record(cls, harvester: "Harvester"): # noqa

@abc.abstractmethod
def get_num_available_resources(self) -> int:
"""Return the number of available resources on the remote service"""
"""Return the number of available resources on the remote service.
If there is a problem retrieving the number of available resource, this
method shall raise `HarvestingException`.
"""

@abc.abstractmethod
def list_resources(
self,
offset: typing.Optional[int] = 0
) -> typing.List[BriefRemoteResource]:
"""Return a list of resources from the remote service"""
"""Return a list of resources from the remote service.
If there is a problem listing resource, this method shall
raise `HarvestingException`.
"""

@abc.abstractmethod
def check_availability(self, timeout_seconds: typing.Optional[int] = 5) -> bool:
Expand Down
12 changes: 10 additions & 2 deletions geonode/harvesting/harvesters/geonode.py
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,14 @@ def check_availability(self, timeout_seconds: typing.Optional[int] = 5) -> bool:
except (requests.HTTPError, requests.ConnectionError):
result = False
else:
result = True
try:
response_payload = response.json()
except json.JSONDecodeError:
logger.exception("Could not decode server response as valid JSON")
result = False
else:
layers_endpoint_present = response_payload.get("layers") is not None
result = layers_endpoint_present
return result

def get_geonode_resource_type(self, remote_resource_type: str) -> ResourceBase:
Expand Down Expand Up @@ -410,8 +417,9 @@ def _get_total_records(
else:
try:
result = response.json().get("meta", {}).get("total_count", 0)
except json.JSONDecodeError:
except json.JSONDecodeError as exc:
logger.exception("Could not decode response as a JSON object")
raise base.HarvestingException(str(exc))
return result

def _get_resource_descriptor(
Expand Down
50 changes: 27 additions & 23 deletions geonode/harvesting/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
models,
utils,
)
from .harvesters.base import BaseHarvesterWorker
from .harvesters import base

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -110,7 +110,7 @@ def _harvest_resource(
"""Harvest a single resource from the input harvestable resource id"""
harvestable_resource = models.HarvestableResource.objects.get(
pk=harvestable_resource_id)
worker: BaseHarvesterWorker = harvestable_resource.harvester.get_harvester_worker()
worker: base.BaseHarvesterWorker = harvestable_resource.harvester.get_harvester_worker()
harvested_resource_info = worker.get_resource(
harvestable_resource, harvesting_session_id)
if worker.should_copy_resource(harvestable_resource):
Expand Down Expand Up @@ -209,9 +209,9 @@ def update_harvestable_resources(self, harvester_id: int):
worker = harvester.get_harvester_worker()
try:
num_resources = worker.get_num_available_resources()
except NotImplementedError:
except (NotImplementedError, base.HarvestingException) as exc:
_handle_harvestable_resources_update_error(
self.request.id, harvester_id=harvester_id)
self.request.id, harvester_id=harvester_id, raised_exception=exc)
else:
page_size = 10
total_pages = math.ceil(num_resources / page_size)
Expand Down Expand Up @@ -245,23 +245,27 @@ def _update_harvestable_resources_batch(
harvester = models.Harvester.objects.get(pk=harvester_id)
worker = harvester.get_harvester_worker()
offset = page * page_size
found_resources = worker.list_resources(offset)
for remote_resource in found_resources:
resource, created = models.HarvestableResource.objects.get_or_create(
harvester=harvester,
unique_identifier=remote_resource.unique_identifier,
title=remote_resource.title,
defaults={
"should_be_harvested": harvester.harvest_new_resources_by_default,
"remote_resource_type": remote_resource.resource_type,
"last_refreshed": now()
}
)
# NOTE: make sure to save the resource because we need to have its
# `last_updated` property be refreshed - this is done in order to be able
# to compare when a resource has been found
resource.last_refreshed = now()
resource.save()
try:
found_resources = worker.list_resources(offset)
except base.HarvestingException:
logger.exception("Could not retrieve list of remote resources.")
else:
for remote_resource in found_resources:
resource, created = models.HarvestableResource.objects.get_or_create(
harvester=harvester,
unique_identifier=remote_resource.unique_identifier,
title=remote_resource.title,
defaults={
"should_be_harvested": harvester.harvest_new_resources_by_default,
"remote_resource_type": remote_resource.resource_type,
"last_refreshed": now()
}
)
# NOTE: make sure to save the resource because we need to have its
# `last_updated` property be refreshed - this is done in order to be able
# to compare when a resource has been found
resource.last_refreshed = now()
resource.save()


@app.task(
Expand Down Expand Up @@ -289,7 +293,6 @@ def _finish_harvestable_resources_update(self, harvester_id: int):
ignore_result=False,
)
def _handle_harvestable_resources_update_error(self, task_id, *args, **kwargs):
logger.debug("Inside the handle_harvestable_resources_update_error task ---------------------------------------------------------------------------------------------")
result = self.app.AsyncResult(str(task_id))
print(f"locals: {locals()}")
print(f"state: {result.state}")
Expand All @@ -300,7 +303,8 @@ def _handle_harvestable_resources_update_error(self, task_id, *args, **kwargs):
now_ = now()
harvester.last_checked_harvestable_resources = now_
harvester.last_check_harvestable_resources_message = (
f"{now_} - There was an error retrieving information on available resources. "
f"{now_} - There was an error retrieving information on available "
f"resources: {result.traceback} - {args} {kwargs}"
f"Please check the logs"
)
harvester.save()
Expand Down

0 comments on commit 8a2bc14

Please sign in to comment.